2026

Joint Statistics and DSI Colloquium: Mateo Díaz
11:30 am–12:30 pm DSI 105
Mateo Díaz
Assistant Professor
Department of Applied Mathematics and Statistics
Mathematical Institute for Data Science
Johns Hopkins University
Title: Leveraging Structure for Faster Algorithms in Optimization and Diffusion
Abstract: Large-scale iterative methods drive modern AI, yet their theoretical foundations often lag behind their empirical success. We argue that bridging this gap requires identifying the inherent problem structure that enables these algorithms to perform well. This talk instantiates this principle across two domains: optimization and generative modeling.
First, we derive new theoretical guarantees for the Levenberg–Morrison-Marquardt method. Although this method is ubiquitous in settings that demand highly accurate solutions—for instance, when training physics-informed neural networks for scientific discovery—classical guarantees do not explain its strong empirical performance in modern overparameterized, ill-conditioned regimes. By reframing it through the lens of composite optimization, we uncover geometric conditions that ensure fast convergence even in these challenging modern regimes.
Second, we introduce Proximal Diffusion Models (PDM). While standard diffusion models rely on score-matching and forward discretization, we demonstrate that a backward discretization using proximal maps offers significant theoretical and practical advantages. Under mild conditions, we prove that PDM achieves $\varepsilon$-accuracy in KL-divergence within $\widetilde{O}(d/\sqrt{\varepsilon})$ steps and empirically demonstrate that it outperforms conventional methods using fewer sampling iterations.

DSI Distinguished Speaker Series: Jeffrey Heer
12:30–2:30 pm DSI 105
Jeffrey Heer
Jerre D. Noe Endowed Professor of Computer Science & Engineering
University of Washington
Title: Augmenting Data Scientists: The Promise and Peril of AI-Assisted Analysis
Abstract: Abstract: Data analysis is a rich sensemaking process, with frequent shifts among data representations, tools, and both conceptual & mathematical models. Computational methods can go beyond fitting models and rendering charts to make in-context recommendations and even guide end-to-end analysis workflows. How does the design of such tools affect people’s exploration, modeling, and understanding of data? In this talk, we will consider methods for augmenting data science work by integrating proactive computational support into interactive tools, with the goal of providing algorithmic assistance to augment and enrich, rather than replace, people’s intellectual work. Across tasks such as data transformation, visualization, and statistical modeling, we apply artificial intelligence to bridge gaps between user intent and robust analysis results. At the same time, we need to pay careful attention to ways these methods may exacerbate bias, foster dependence, and pose challenges for the future of data analysis.
Student Seminar: Boxuan Zhang
3:00–3:30 pm Jones 111
Thursday, February 19, 2026, at 3:00 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis Presentation
Boxuan Zhang, Department of Statistics, The University of Chicago
“Conformal Prediction for Bayesian Posterior”

Joint Computer Science and Data Science Institute Seminar: Shreya Shankar
2:30–3:30 pm DSI 105
Shreya Shankar
PhD Candidate in the Data Systems and Foundations Group
University of California, Berkeley
Title: Building Effective Unstructured Data Systems
Abstract: Databases and other data systems have successfully democratized data-oriented computation across domains, thanks to decades of research in system internals and end-user interfaces. However, such systems center on structured (i.e., tabular) data; unstructured data—the vast majority of data—has largely been ignored. Large language models (LLMs) now give us a building block for unstructured data analysis, and we face the same questions as in the early days of data systems—e.g., how should users author queries? How do we efficiently execute queries at scale?—but many well-established tenets from traditional data systems no longer hold. In my talk, I will present DocETL, a system I developed for unstructured data analysis. I will discuss how we had to rethink query optimization under these new assumptions, optimizing user-written pipelines for both accuracy and efficiency—as well as end-user interfaces for authoring, iterating on, and debugging pipelines. DocETL is open-source with 3.5k+ GitHub stars; our hosted interface has supported 4.1k+ pipelines across 30+ S&P-500 industries. Query optimization ideas from our work have been adopted in databases such as Snowflake and BigQuery, and our interface design principles have been adopted by companies like LangChain and OpenAI.
Student Seminar: Yushuo Li
2:00–2:30 pm Jones 111
Wednesday, February 18, 2026, at 2:00 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis Presentation
Yushuo Li, Department of Statistics, The University of Chicago
“Asymptotically Optimal Conformal Prediction for Classification”
Student Seminar: Buning (Erica) Fan
1:30–2:00 pm Jones 111
Wednesday, February 18, 2026, at 1:30 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis Presentation
Buning Fan, Department of Statistics, The University of Chicago
“Comparing Bayesian Software Platforms for Three-Level Mixed Effects Location Scale Models”
Student Seminar: Zixuan Qin
1:00–1:30 pm Jones 111
Wednesday, February 18, 2026, at 1:00 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis Presentation
Zixuan Qin Department of Statistics, The University of Chicago
“Operator Learning and Bispectrum-Guided Diffusion for Functional Multi-Reference Alignment”

Joint Statistics and DSI Colloquium: Ana-Andreea Stoica
2:00–3:00 pm DSI 105
Ana-Andreea Stoica
Research Group Leader in the Social Foundations of Computation Department
Max Planck Institute for Intelligent Systems
Title: Designing for Society: AI in Networks, Markets, and Platforms
Abstract: AI systems increasingly mediate how people access information, economic opportunities, and essential services. Yet when deployed in social environments—online platforms, labor markets, and information ecosystems—AI interacts with complex human behavior, strategic incentives, and structural inequality. This talk focuses on foundational challenges and opportunities for AI systems: how to design and evaluate algorithmic interventions in complex social environments. I will present recent work on causal inference under competing treatments, which formalizes how competition for user attention and strategic behavior among experimenters distort experimental data and invalidate naïve estimates of algorithmic impact. By modeling experimentation as a strategic data acquisition problem, we show how evaluation itself becomes an optimization problem, and we derive mechanisms that recover meaningful estimates despite interference and competition. I connect this problem to deriving foundational properties of AI systems that enable responsible and efficient algorithmic design. Beyond this case study, the talk highlights broader implications for the design and evaluation of AI systems in networks, markets, and platforms. I argue that responsible deployment requires rethinking evaluation methodologies to account for incentives, feedback loops, and system-level effects, and I outline how algorithmic and statistical tools can support more accountable and socially aligned AI systems.
Student Seminar: Jose Cruzado
2:00–2:30 pm Jones 111
Monday, February 16, 2026, at 2:00 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis Presentation
Jose Cruzado, Department of Statistics, The University of Chicago
“Expected Gradient Outer Product Reparameterization in Deep ConvolutionalNetworks”
Student Seminar: Kaushik Kancharla
9:00–9:30 am Jones 111
Wednesday, February 11, 2026, at 9:00 AM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis Presentation
Kaushik Kancharla, Department of Statistics, The University of Chicago
“The Intraday Dynamics of the Volatility Term Structure”