2025

Statistics Colloquium: Linjun Zhang
11:30 am–12:30 pm Jones 303
Linjun Zhang Associate Professor in the Department of Statistics, at Rutgers University
Title: A Statistical Hypothesis Testing Framework for Data Misappropriation Detection in Large Language Models
Abstract: Large Language Models (LLMs) are rapidly gaining enormous popularity in recent years. However, the training of LLMs has raised significant privacy and legal concerns, particularly regarding the inclusion of copyrighted materials in their training data without proper attribution or licensing, which falls under the broader issue of data misappropriation. In this article, we focus on a specific problem of data misappropriation detection, namely, to determine whether a given LLM has incorporated data generated by another LLM. To address this issue, we propose embedding watermarks into the copyrighted training data and formulating the detection of data misappropriation as a hypothesis testing problem. We develop a general statistical testing framework, construct a pivotal statistic, determine the optimal rejection threshold, and explicitly control the type I and type II errors. Furthermore, we establish the asymptotic optimality properties of the proposed tests, and demonstrate its empirical effectiveness through intensive numerical experiments.

Student Seminars: Wei Kuang
3:00–5:00 pm Cobb 203
Friday, April 18, 2025, at 3:00 PM, in Cobb 203, 5811 S. Ellis Avenue
PhD Dissertation Defense Presentation
Wei Kuang, Department of Statistics, The University of Chicago
“Estimation Using Second-Order Methods”
Student Seminar: Oscar Liu
2:00–3:00 pm Ryerson 176
Friday, April 18, 2025, at 2:00 PM, in Ryerson 176, 1100 E 58th St.
Master’s Thesis l Presentation
Oscar Liu, Department of Statistics, The University of Chicago
“Bias Correction of Ground Temperature in Hawaii Using Gaussian Process Models”

Student Seminar: Zihao Wang
1:00–3:00 pm Jones 226
Friday, April 18, 2025, at 1:00 PM, in Jones 226, 5747 S. Ellis Avenue
PhD Dissertation Defense Presentation
Zihao Wang, Department of Statistics, The University of Chicago
“Understanding and Steering Large Generative Models: From Representation Geometry to Stress-Testing Generative Behavior”

Student Seminars: YoonHaeng Hur
10:00 am–12:00 pm Ryerson 255
Friday, April 18, 2025, at 10:00 AM, in Ryerson 255, 1100 E 58th St.
PhD Dissertation Defense Presentation
YoonHaeng Hur, Department of Statistics, The University of Chicago
“Infinite-Dimensional Inference and Learning via Optimal Transport”

Student Seminar: Bryce Jiang
10:30–11:00 am Jones 303
Wednesday, April 16, 2025, at 10:30 AM, in Jones 303, 5747 S. Ellis Avenue
Master’s Thesis l Presentation
Bryce Jiang, Department of Statistics, The University of Chicago
“Supermask-enhanced Foundation Model for Continual Time Series Forecasting”
Student Seminar: Xinyi Wang
2:30–3:00 pm Jones 111
Tuesday, April 15, 2025, at 2:30 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis l Presentation
Xinyi Wang, Department of Statistics, The University of Chicago
“Robust Empirical Bayesian Hierarchical Model for Covariate Adjustment in Stratified Randomized Experiments”
Student Seminar: Wanyi Ling
2:00–2:30 pm Jones 111
Tuesday, April 15, 2025, at 2:00 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis l Presentation
Wanyi Ling, Department of Statistics, The University of Chicago
“An empirical partially Bayes method for adjusting batch effects”

Statistics Colloquium: Ashwin Pananjady
11:30 am–12:30 pm Jones 303
Ashwin Pananjady, H. Milton Stewart School of Industrial and Systems Engineering/The School of Electrical and Computer Engineering, Georgia Institute of Technology
Title: Predicting the behavior of complex iterative algorithms with random data
Abstract: Iterative algorithms are the workhorses of modern statistical signal processing and machine learning. Algorithm design and analysis is largely based on variational properties of the optimization problem, and the classical focus has been on obtaining convergence guarantees over classes of problems that possess certain types of geometry. However, modern optimization problems in statistical settings are high-dimensional and involve random data, and algorithms often behave differently from what is suggested by classical theory. With the motivation of better understanding optimization in such settings, I will present a toolbox for deriving “state evolutions” for a wide variety of algorithms with random data. These are non-asymptotic, near-exact predictions of the statistical behavior of the algorithm, which apply even when the underlying optimization problem is nonconvex or the algorithm is randomly initialized. We will showcase these predictions on deterministic and stochastic variants of complex algorithms employed in some canonical statistical models.
Student Seminar: Sean Richardson
12:30–1:00 pm Jones 111
Thursday, April 10, 2025, at 12:30 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis l Presentation
Sean Richardson, Department of Statistics, The University of Chicago
“Causal Evaluation of Black-Box Reward Models via Textual Interventions”