Past Events

2026

Statistics Colloquium: Cynthia Rudin

11:30 am–12:30 pm Jones 303

Cynthia Rudin
Department of Computer Science
Duke University

Title: Many Good Models Leads To…

Abstract: As it turns out, many good models leads to amazing things! The Rashomon Effect, coined by Leo Breiman, describes the phenomenon that there exist many equally good predictive models for the same dataset. This phenomenon happens for many real datasets and when it does, it sparks both magic and consternation, but mostly magic. In light of the Rashomon Effect, my collaborators and I propose to reshape the way we think about machine learning, particularly for tabular data problems in the nondeterministic (noisy) setting. I’ll address how the Rashomon Effect impacts (1) the existence of simple-yet-accurate models, (2) flexibility to address user preferences, such as fairness and monotonicity, without losing performance, (3) algorithm choice, specifically, providing advanced knowledge of which algorithms might be suitable for a given problem, (4) public policy, and (5) scientific discovery. I’ll also discuss a theory of when the Rashomon Effect occurs and why: interestingly, noise in data leads to a large Rashomon Effect. My goal is to illustrate how the Rashomon Effect can have a massive impact on the use of machine learning for complex problems in society.

I’ll be mainly discussing the paper “Amazing Things Come From Having Many Good Models” (ICML spotlight, 2024) which is joint work with Chudi Zhong, Lesia Semenova, Margo Seltzer, Ronald Parr, Jiachang Liu, Srikar Katta, Jon Donnelly, Harry Chen, and Zachery Boner.

Mar 23

Joint Statistics and DSI Colloquium: Soledad Villar

2:00–3:00 pm DSI 105

Soledad Villar
Assistant Professor
Johns Hopkins University

Title: Machine Learning and Symmetries

Abstract: Symmetries play a significant role in machine learning. In scientific applications, they often arise as constraints imposed by physical laws. More broadly, symmetries emerge whenever objects admit multiple ways to express them (for example, in graph machine learning). In addition, modern machine learning models are heavily overparameterized, so many distinct sets of parameters can represent the same function, revealing further underlying symmetries.

In this talk, we describe methods for incorporating symmetries into machine learning models using classical tools from algebra, including invariant theory and Galois theory. A particularly interesting feature of symmetry-preserving models is that they can be defined independently of the size or dimension of the input. The formalization of this setting, known as any-dimensional machine learning, is inspired by ideas from representation stability. In this talk we present a theoretical framework for understanding the assumptions imposed by such models, which allows us to align learning models with data of varying sizes and learning tasks in a principled way.

Any-dimensional models use a fixed set of parameters and can be evaluated on data of varying sizes. Hyperparameter transfer considers the complementary setting, in which the data are fixed while the model size varies, and studies how optimal hyperparameters (such as the learning rate) can be transferred from smaller models to larger ones. If time permits, we will also discuss recent connections between any-dimensional machine learning and hyperparameter transfer.

Mar 5

Student Seminar: Chengran Yang

2:00–2:30 pm Jones 111

Thursday, March 5, 2026, at 2:00 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis Presentation
Chengran Yang Department of Statistics, The University of Chicago
“Can Machine Learning learn Weak Signal?  ——Extending to Binary Logistic Models”

Mar 5

Student Seminar: Sili (Shelly) Wang

9:00–9:30 am Jones 111

Wednesday, March 4, 2026, at 9:00 AM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis Presentation
Sili (Shelly) Wang, Department of Statistics, The University of Chicago
“TBA”

Mar 4

Billingsley Lectures on Probability: Christophe Garban

5:00–6:00 pm Kent 120

Billingsley Lectures on Probability

Reception immediately following the lecture at 6:10 pm, in Jones 111, 5747 S Ellis Ave.

Christophe Garban
Université Lyon 1/Courant - NYU

Title: Continuous Symmetry and Phase Transitions in Lattice Spin Systems

Abstract: A central problem in statistical physics is to understand how spins placed on the lattice Z^d interact and collectively organize at different temperatures. When the spins take values in a discrete set — for instance in the celebrated Ising model, where \sigma_x\in\{−1,+1\} — the mechanisms governing phase transitions are by now relatively well understood.

The situation changes dramatically when the spins take values in a continuous space, such as the unit circle S^1 in the XY model or the unit sphere S^2 in the classical Heisenberg model. In this setting, new phenomena appear, and the behavior depends strongly on whether the underlying symmetry is Abelian or non-Abelian. In particular, the non-Abelian case remains far more mysterious.

In this talk, I will introduce the mathematics of spin systems with continuous symmetry, emphasizing their deep connections with analysis, including harmonic functions, harmonic maps, and geometric analysis. I will also describe some recent results and open problems in the area.

No prior background in statistical physics or probability will be assumed. Based on joint works with J. Aru, D. van Engelenburg, P. Dario, N. de Montgolfier, A. Sepúlveda and T. Spencer.

Reception immediately following the lecture at 6:10 pm, in Jones 111, 5747 S Ellis Ave.

Feb 26

Joint Statistics and DSI Colloquium: Jiaqi Zhang

4:00–5:00 pm DSI 105

Jiaqi Zhang
PhD Candidate
Massachusetts Institute of Technology

Title: Modeling Large-Scale Interventions

Abstract: Complex causal mechanisms among genes govern cellular functions in health and disease. Understanding these mechanisms can accelerate therapeutic discovery but remains challenging due to the large number of genes and their intricate dependencies. Recent advances in experimental technologies are making this problem increasingly tractable: it is now possible to systematically intervene on individual genes or gene combinations in single cells and measure their downstream effects, enabling empirical identification and validation of causal relationships. However, interventional data are high-dimensional, making interpretation challenging, and costly to collect.

In this talk, I will present our work tackling these challenges from three aspects. First, we introduced causal representation theories and algorithms with identifiability guarantees to uncover latent variables behind high-dimensional data. Second, we developed a method to model interventional data that can predict the effects of novel interventions with high accuracy, incorporating both distributional shifts and prior domain knowledge. Finally, we showed how predictive intervention modeling can improve future experimental design, illustrated by an application where we predicted and validated previously unknown T-cell regulators with therapeutic potential for cancer immunotherapy.

Feb 26

Student Seminar: Brian Ping_Huan Wu

3:30–4:00 pm Jones 304

Tuesday, February 24, 2026, at 3:30 PM, in Jones 304, 5747 S. Ellis Avenue
Master’s Thesis Presentation
Brian Ping-Huan Wu, Department of Statistics, The University of Chicago
“Fast Estimation and Valid Statistical Inference for Mixed-Effect Location-Scale Models Using Variational Inference”

Feb 24

Joint Statistics and DSI Colloquium: Mateo Díaz

11:30 am–12:30 pm DSI 105

Mateo Díaz
Assistant Professor
Department of Applied Mathematics and Statistics
Mathematical Institute for Data Science
Johns Hopkins University

Title: Leveraging Structure for Faster Algorithms in Optimization and Diffusion

Abstract: Large-scale iterative methods drive modern AI, yet their theoretical foundations often lag behind their empirical success. We argue that bridging this gap requires identifying the inherent problem structure that enables these algorithms to perform well. This talk instantiates this principle across two domains: optimization and generative modeling.

First, we derive new theoretical guarantees for the Levenberg–Morrison-Marquardt method. Although this method is ubiquitous in settings that demand highly accurate solutions—for instance, when training physics-informed neural networks for scientific discovery—classical guarantees do not explain its strong empirical performance in modern overparameterized, ill-conditioned regimes. By reframing it through the lens of composite optimization, we uncover geometric conditions that ensure fast convergence even in these challenging modern regimes.

Second, we introduce Proximal Diffusion Models (PDM). While standard diffusion models rely on score-matching and forward discretization, we demonstrate that a backward discretization using proximal maps offers significant theoretical and practical advantages. Under mild conditions, we prove that PDM achieves $\varepsilon$-accuracy in KL-divergence within $\widetilde{O}(d/\sqrt{\varepsilon})$ steps and empirically demonstrate that it outperforms conventional methods using fewer sampling iterations.

Feb 23

DSI Distinguished Speaker Series: Jeffrey Heer

12:30–2:30 pm DSI 105

Jeffrey Heer
Jerre D. Noe Endowed Professor of Computer Science & Engineering
University of Washington

Title: Augmenting Data Scientists: The Promise and Peril of AI-Assisted Analysis

Abstract: Abstract: Data analysis is a rich sensemaking process, with frequent shifts among data representations, tools, and both conceptual & mathematical models. Computational methods can go beyond fitting models and rendering charts to make in-context recommendations and even guide end-to-end analysis workflows. How does the design of such tools affect people’s exploration, modeling, and understanding of data? In this talk, we will consider methods for augmenting data science work by integrating proactive computational support into interactive tools, with the goal of providing algorithmic assistance to augment and enrich, rather than replace, people’s intellectual work. Across tasks such as data transformation, visualization, and statistical modeling, we apply artificial intelligence to bridge gaps between user intent and robust analysis results. At the same time, we need to pay careful attention to ways these methods may exacerbate bias, foster dependence, and pose challenges for the future of data analysis.

Feb 20

Student Seminar: Boxuan Zhang

3:00–3:30 pm Jones 111

Thursday, February 19, 2026, at 3:00 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis Presentation
Boxuan Zhang, Department of Statistics, The University of Chicago
“Conformal Prediction for Bayesian Posterior”

Feb 19