2026

Department of Computer Science and Data Science Institute Presents: Benjamin Laufer
2:30–3:30 pm DSI 105
Benjamin Laufer
PhD Candidate
Cornell Tech
Title: AI Ecosystems: Structure, Strategy, Risk and Regulation
Abstract: Machine learning (ML) and artificial intelligence (AI) are not standalone artifacts: they are ecosystems where foundation models are adapted and deployed through layered pipelines spanning developers, platforms, users and regulators. This talk explores how the structure of these ecosystems shapes the distribution of value and risk, and determines system-level properties like safety and fairness. I begin with a game-theoretic model of the interaction between general-purpose producers and domain specialists, using it to examine how regulatory design shapes incentives and equilibrium behaviors. I then connect these formal insights to empirical measurements from 1.86 million open-source AI models, reconstructing lineage networks to quantify how behaviors and failures propagate through fine-tuning. Finally, zooming in from the aggregate structure of the ecosystem to the design of the algorithms themselves, I describe my work in algorithmic fairness, framing the identification of less discriminatory algorithms as a search problem with provable statistical guarantees. I close by outlining a forward-looking research agenda aimed at building both the technical infrastructure and policy mechanisms required to steer AI ecosystems toward robust, accountable and democratic outcomes.

CAM Colloquium: Eitan Levin
4:00–5:00 pm Jones 303
Eitan Levin
Applied and Computational Mathematics
California Institute of Technology
Title: “Any-Dimensional Data Science”
Abstract: Many applications throughout data science require methods that are well-defined and performant for problems or data of any size. In machine learning, we are given training data from which we wish to learn algorithms capable of solving problems of any size. In particular, the learned algorithm must generalize to inputs of sizes that are not present in the training set. For example, algorithms for processing graphs or point clouds must generalize to inputs with any number of nodes or points. A second challenge pertaining to any-dimensionality arises in applications such as game theory or network statistics in which we wish to characterize solutions to problems of growing size. Examples include computing values of games with any number of players, or proving moment inequalities for random vectors and graphs of any size. From an optimization perspective, this amounts to deriving bounds that hold for entire sequences of problems of growing dimensionality. Finally, in applications involving graph-valued data, we wish to produce constant-sized summaries of arbitrarily-large networks that preserve their essential structural properties. These summaries can then be used for efficiently testing properties of the underlying large network, e.g., testing for the presence of hubs is of interest in massive biological and traffic networks. We develop a unified framework to tackle such any-dimensional problems by using random sampling maps to compare and summarize objects of different sizes. Our methodology leverages new de Finetti-type theorems and the recently-identified phenomenon of representation stability. We illustrate the resulting framework for any-dimensional problems in several applications.

CAM Colloquium: Yijun Dong
4:00–5:00 pm Jones 303
Yijun Dong
Courant Institute of Mathematical Sciences
New York University
Title: “Understanding Post-training through the Lens of Intrinsic Dimension.”
Abstract: Post-training is becoming the primary interface between powerful pre-trained models and challenging real-world problems, where we aim to adapt large pre-trained models via limited, heterogeneous data while preserving their capabilities and reliability. In this talk, we introduce a step toward a unified theoretical and algorithmic framework for post-training through the lens of intrinsic dimensions. In particular, we focus on an emerging post-training phenomenon, weak-to-strong (W2S) generalization, in which a strong pre-trained student model fine-tuned only with supervision from a weaker teacher model can often outperform its teacher. Theoretically, we explain when and why W2S generalization occurs from a sample-efficiency perspective, reveal the value of teacher-student discrepancy for W2S, and investigate the effects of systematic biases on W2S. Algorithmically, we propose a practical, theory-inspired remedy for W2S under spurious correlation. The talk will conclude with an outlook on the broad applications of random matrix tools for understanding and improving post-training.

Joint Statistics and DSI Colloquium: Tudor Manole
11:30 am–12:30 pm DSI 105
Tudor Manole
Institute for Data, Systems, and Society
Massachusetts Institute of Technology
Title: “A Statistical Framework for Benchmarking Quantum Computers”
Abstract: Recent years have witnessed quantum computing technologies increasingly move from theoretical proposals to functioning experimental platforms, reaching major milestones such as the demonstration of beyond-classical computational tasks. Despite these exciting advances, current quantum computers experience hardware-level errors which limit their scalability, and which must be carefully identified before they can be mitigated. In this talk, I will develop a statistical framework for characterizing errors in quantum devices, using an existing experimental platform known as random circuit sampling. Data arising from this experiment can be described through a high-dimensional discrete latent variable model parametrized by the device’s error rates. We develop estimators for these error rates which are provably consistent even for large-scale quantum devices. We then apply our methods to benchmark a recent state-of-the-art quantum processor, obtaining a detailed report of error rates which were largely unavailable from past studies. I will close by placing these results in the broader context of my interdisciplinary work in the physical sciences, and by discussing some of my other research interests in nonparametric statistics and statistical optimal transport.
2025
Student Seminar: Raphael Rossellini
11:00 am–12:30 pm DSI 322
Monday, December 15, 2025, at 11:00 AM, in DSI 322, 5460 S. University Ave
Dissertation Proposal Presentation
Raphael Rossellini, Department of Statistics, The University of Chicago
“Testing and ensuring calibration for decision-makers”
Student Seminar: Or Goldreich
10:00–11:30 am Jones 111
Monday, December 15, 2025, at 10:00 AM, in Jones 111, 5747 S. Ellis Avenue
Dissertation Proposal Presentation
Or Goldreich, Department of Statistics, The University of Chicago
“TBA”
Student Seminar: Jimmy Lederman
2:00–3:30 pm Jones 111
Wednesday, December 10, 2025, at 2:00 PM, in Jones 111, 5747 S. Ellis Avenue
Dissertation Proposal Presentation
Jimmy Lederman, Department of Statistics, The University of Chicago
“Count-Based Data Augmentation for Flexible Probabilistic Modeling of Nonstandard Data”
Student Seminar: Benedetta Bruni
9:00 am–10:30 pm DSI Building, Room 322
Wednesday, December 10, 2025, at 9:00 AM, in Room 322, 5460 S University Ave
Dissertation Proposal Presentation
Benedetta Bruni, Department of Statistics, The University of Chicago
“A Generalized Bayesian Approach to Tree Models for Densities”
Student Seminar: Jeonghwan Lee
2:30–4:00 pm Jones 304
Tuesday, December 9, 2025, at 2:30 PM, in Jones 304, 5747 S. Ellis Avenue
Dissertation Proposal Presentation
Jeonghwan Lee, Department of Statistics, The University of Chicago
“Topics in modern statistical learning: Distribution shift and learning with synthetic data”
Student Seminar: Qi Chen
10:30–11:00 am Jones 111
Tuesday, December 9, 2025, at 10:30 AM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis l Presentation
Qi Chen, Department of Statistics, The University of Chicago
“Graphic model Geometry-Aware Hamiltonian Variational Auto-Encoder”