Past Events

2026

Department of Computer Science and Data Science Institute Presents: Benjamin Laufer

2:30–3:30 pm DSI 105

Benjamin Laufer
PhD Candidate
Cornell Tech

Title: AI Ecosystems: Structure, Strategy, Risk and Regulation

Abstract: Machine learning (ML) and artificial intelligence (AI) are not standalone artifacts: they are ecosystems where foundation models are adapted and deployed through layered pipelines spanning developers, platforms, users and regulators. This talk explores how the structure of these ecosystems shapes the distribution of value and risk, and determines system-level properties like safety and fairness. I begin with a game-theoretic model of the interaction between general-purpose producers and domain specialists, using it to examine how regulatory design shapes incentives and equilibrium behaviors. I then connect these formal insights to empirical measurements from 1.86 million open-source AI models, reconstructing lineage networks to quantify how behaviors and failures propagate through fine-tuning. Finally, zooming in from the aggregate structure of the ecosystem to the design of the algorithms themselves, I describe my work in algorithmic fairness, framing the identification of less discriminatory algorithms as a search problem with provable statistical guarantees. I close by outlining a forward-looking research agenda aimed at building both the technical infrastructure and policy mechanisms required to steer AI ecosystems toward robust, accountable and democratic outcomes.

Jan 26

CAM Colloquium: Eitan Levin

4:00–5:00 pm Jones 303

Eitan Levin
Applied and Computational Mathematics
California Institute of Technology

Title: “Any-Dimensional Data Science”

Abstract: Many applications throughout data science require methods that are well-defined and performant for problems or data of any size.  In machine learning, we are given training data from which we wish to learn algorithms capable of solving problems of any size.  In particular, the learned algorithm must generalize to inputs of sizes that are not present in the training set.  For example, algorithms for processing graphs or point clouds must generalize to inputs with any number of nodes or points.  A second challenge pertaining to any-dimensionality arises in applications such as game theory or network statistics in which we wish to characterize solutions to problems of growing size.  Examples include computing values of games with any number of players, or proving moment inequalities for random vectors and graphs of any size.  From an optimization perspective, this amounts to deriving bounds that hold for entire sequences of problems of growing dimensionality.  Finally, in applications involving graph-valued data, we wish to produce constant-sized summaries of arbitrarily-large networks that preserve their essential structural properties.  These summaries can then be used for efficiently testing properties of the underlying large network, e.g., testing for the presence of hubs is of interest in massive biological and traffic networks.  We develop a unified framework to tackle such any-dimensional problems by using random sampling maps to compare and summarize objects of different sizes.  Our methodology leverages new de Finetti-type theorems and the recently-identified phenomenon of representation stability.  We illustrate the resulting framework for any-dimensional problems in several applications. 

Jan 22

CAM Colloquium: Yijun Dong

4:00–5:00 pm Jones 303

Yijun Dong
Courant Institute of Mathematical Sciences
New York University

Title: “Understanding Post-training through the Lens of Intrinsic Dimension.”

Abstract: Post-training is becoming the primary interface between powerful pre-trained models and challenging real-world problems, where we aim to adapt large pre-trained models via limited, heterogeneous data while preserving their capabilities and reliability. In this talk, we introduce a step toward a unified theoretical and algorithmic framework for post-training through the lens of intrinsic dimensions. In particular, we focus on an emerging post-training phenomenon, weak-to-strong (W2S) generalization, in which a strong pre-trained student model fine-tuned only with supervision from a weaker teacher model can often outperform its teacher. Theoretically, we explain when and why W2S generalization occurs from a sample-efficiency perspective, reveal the value of teacher-student discrepancy for W2S, and investigate the effects of systematic biases on W2S. Algorithmically, we propose a practical, theory-inspired remedy for W2S under spurious correlation. The talk will conclude with an outlook on the broad applications of random matrix tools for understanding and improving post-training.

Jan 15

Joint Statistics and DSI Colloquium: Tudor Manole

11:30 am–12:30 pm DSI 105

Tudor Manole
Institute for Data, Systems, and Society
Massachusetts Institute of Technology

Title: “A Statistical Framework for Benchmarking Quantum Computers”

Abstract: Recent years have witnessed quantum computing technologies increasingly move from theoretical proposals to functioning experimental platforms, reaching major milestones such as the demonstration of beyond-classical computational tasks. Despite these exciting advances, current quantum computers experience hardware-level errors which limit their scalability, and which must be carefully identified before they can be mitigated. In this talk, I will develop a statistical framework for characterizing errors in quantum devices, using an existing experimental platform known as random circuit sampling. Data arising from this experiment can be described through a high-dimensional discrete latent variable model parametrized by the device’s error rates. We develop estimators for these error rates which are provably consistent even for large-scale quantum devices. We then apply our methods to benchmark a recent state-of-the-art quantum processor, obtaining a detailed report of error rates which were largely unavailable from past studies. I will close by placing these results in the broader context of my interdisciplinary work in the physical sciences, and by discussing some of my other research interests in nonparametric statistics and statistical optimal transport.

Jan 12

2025

Student Seminar: Raphael Rossellini

11:00 am–12:30 pm DSI 322

Monday, December 15, 2025, at 11:00 AM, in DSI 322, 5460 S. University Ave
Dissertation Proposal Presentation
Raphael Rossellini, Department of Statistics, The University of Chicago
“Testing and ensuring calibration for decision-makers”

Dec 15

Student Seminar: Or Goldreich

10:00–11:30 am Jones 111

Monday, December 15, 2025, at 10:00 AM, in Jones 111, 5747 S. Ellis Avenue
Dissertation Proposal Presentation
Or Goldreich, Department of Statistics, The University of Chicago
“TBA”

Dec 15

Student Seminar: Jimmy Lederman

2:00–3:30 pm Jones 111

Wednesday, December 10, 2025, at 2:00 PM, in Jones 111, 5747 S. Ellis Avenue
Dissertation Proposal Presentation
Jimmy Lederman, Department of Statistics, The University of Chicago
“Count-Based Data Augmentation for Flexible Probabilistic Modeling of Nonstandard Data”

Dec 10

Student Seminar: Benedetta Bruni

9:00 am–10:30 pm DSI Building, Room 322

Wednesday, December 10, 2025, at 9:00 AM, in Room 322, 5460 S University Ave
Dissertation Proposal Presentation
Benedetta Bruni, Department of Statistics, The University of Chicago
“A Generalized Bayesian Approach to Tree Models for Densities”

Dec 10

Student Seminar: Jeonghwan Lee

2:30–4:00 pm Jones 304

Tuesday, December 9, 2025, at 2:30 PM, in Jones 304, 5747 S. Ellis Avenue
Dissertation Proposal Presentation
Jeonghwan Lee, Department of Statistics, The University of Chicago
“Topics in modern statistical learning: Distribution shift and learning with synthetic data”

Dec 9

Student Seminar: Qi Chen

10:30–11:00 am Jones 111

Tuesday, December 9, 2025, at 10:30 AM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis l Presentation
Qi Chen, Department of Statistics, The University of Chicago
“Graphic model Geometry-Aware Hamiltonian Variational Auto-Encoder”

Dec 9