Search
Calendar
June 2025
S M T W T F S
« May    
1234567
891011121314
15161718192021
22232425262728
2930  
Archives

Posts Tagged ‘PyData2024’

PostHeaderIcon Predictive Modeling and the Illusion of Signal

Introduction

Vincent Warmerdam delves into the illusions often encountered in predictive modeling, highlighting the cognitive traps and statistical misconceptions that lead to overconfidence in model performance.

The Seduction of Spurious Correlations

Models often perform well on training data by exploiting noise rather than genuine signal. Vincent emphasizes critical thinking and statistical rigor to avoid being misled by deceptively strong results.

Building Robust Models

Using robust cross-validation, considering domain knowledge, and testing against out-of-sample data are vital strategies to counteract the illusion of predictive prowess.

Conclusion

Data science is not just coding and modeling — it requires constant skepticism, critical evaluation, and humility. Vincent reminds us to stay vigilant against the comforting but dangerous mirage of false predictability.

PostHeaderIcon Building Intelligent Data Products at Scale

Introduction

Thomas Vachon shares insights into scaling data-driven products, blending machine learning, engineering, and user-centric design to create impactful and intelligent applications.

Key Ingredients for Success

Building intelligent products requires aligning data pipelines, model training, deployment infrastructure, and feedback loops. Vachon stresses the importance of cross-functional collaboration between data scientists, software engineers, and product teams.

Real-World Lessons

From architectural best practices to team organization strategies, Vachon illustrates how to navigate the complexity of scaling data initiatives sustainably.

Conclusion

Intelligent data products demand not only technical excellence but also thoughtful design, scalability planning, and user empathy from day one.

PostHeaderIcon Boosting AI Reliability: Uncertainty Quantification with MAPIE

Watch the video

Introduction

Thierry Cordier and Valentin Laurent introduce MAPIE, a Python library within scikit-learn-contrib, designed for uncertainty quantification in machine learning models.

MAPIE on GitHub

Managing Uncertainty in Machine Learning

In AI applications — from autonomous vehicles to medical diagnostics — understanding prediction uncertainty is crucial. MAPIE uses conformal prediction methods to generate prediction intervals with controlled confidence, ensuring safer and more interpretable AI systems.

Key Features

MAPIE supports regression, classification, time series forecasting, and complex tasks like multi-label classification and semantic segmentation. It integrates seamlessly with scikit-learn, TensorFlow, PyTorch, and custom models.

Real-World Use Cases

By generating calibrated prediction intervals, MAPIE enables selective classification, robust decision-making under uncertainty, and provides statistical guarantees critical for safety-critical AI systems.

Conclusion

MAPIE empowers data scientists to quantify uncertainty elegantly, bridging the gap between predictive power and real-world reliability.

PostHeaderIcon [PyData Paris 2024] Exploring Quarto Dashboard for Impactful and Visual Communication

Exploring Quarto Dashboard for Impactful and Visual Communication

Watch the video

Introduction

Christophe Dervieux introduces us to Quarto Dashboard, a powerful open-source scientific and technical publishing system. Designed to create impactful visual communication directly from Jupyter Notebooks, Quarto enables the seamless creation of interactive charts, dashboards, and dynamic narratives.

Building Visual Communication with Quarto

Quarto extends standard markdown with advanced features tailored for scientific writing. It offers support for multiple computation engines, allowing narratives and executable code to merge into various outputs: PDF, HTML pages, websites, books, and especially dashboards. The dashboard format enhances data communication by organizing visual metrics in an efficient and impactful layout.

Using Quarto, rendering a Jupyter notebook becomes simple: with just a command-line instruction (quarto render), users can output polished, shareable dashboards. Additional extensions, such as those available in VS Code, JupyterLab, and Positron IDEs, streamline this experience further.

Dashboard Features and Design

Dashboards in Quarto organize content using components like cards, rows, columns, sidebars, and tabs. Each element structures visual outputs like plots, tables, and value boxes, allowing maximum clarity. Customization is straightforward, leveraging YAML configuration and Bootstrap-based theming. Users can create multi-page navigation, interactivity through JavaScript libraries, and adapt layouts for specific audiences.

Recent updates even enable branding dashboards easily with SCSS themes, making Quarto ideal for both scientific and corporate environments.

Conclusion

Quarto revolutionizes technical communication by enabling scientists and analysts to produce professional-grade dashboards and publications effortlessly. Christophe’s session at PyData Paris 2023 showcased the simplicity, power, and flexibility Quarto brings to modern data storytelling.

PostHeaderIcon Onyxia: A User-Centric Interface for Data Scientists in the Cloud Age

Watch the video

Introduction

The team from INSEE presents Onyxia, an open-source, Kubernetes-based platform designed to offer flexible, collaborative, and powerful cloud environments for data scientists.

Rethinking Data Science Infrastructure

Traditional local development faces issues like configuration divergence, data duplication, and limited compute resources. Onyxia solves these by offering isolated namespaces, integrated object storage, and a seamless user interface that abstracts Kubernetes and S3 complexities.

Versatile Deployment

With a few clicks, users can launch preconfigured environments — including Jupyter notebooks, VS Code, Postgres, and MLflow — empowering fast innovation without heavy IT overhead. Organizations can extend Onyxia by adding custom services, ensuring future-proof, evolvable data labs.

Success Stories

Adopted across French universities and research labs, Onyxia enables students and professionals alike to work in secure, scalable, and fully-featured environments without managing infrastructure manually.

Conclusion

Onyxia democratizes access to powerful cloud tools for data scientists, streamlining collaboration and fostering innovation.