Past Events

2025

Statistics Colloquium: Linjun Zhang

11:30 am–12:30 pm Jones 303

Linjun Zhang Associate Professor in the Department of Statistics, at Rutgers University

Title: A Statistical Hypothesis Testing Framework for Data Misappropriation Detection in Large Language Models

Abstract: Large Language Models (LLMs) are rapidly gaining enormous popularity in recent years. However, the training of LLMs has raised significant privacy and legal concerns, particularly regarding the inclusion of copyrighted materials in their training data without proper attribution or licensing, which falls under the broader issue of data misappropriation. In this article, we focus on a specific problem of data misappropriation detection, namely, to determine whether a given LLM has incorporated data generated by another LLM. To address this issue, we propose embedding watermarks into the copyrighted training data and formulating the detection of data misappropriation as a hypothesis testing problem. We develop a general statistical testing framework, construct a pivotal statistic, determine the optimal rejection threshold, and explicitly control the type I and type II errors. Furthermore, we establish the asymptotic optimality properties of the proposed tests, and demonstrate its empirical effectiveness through intensive numerical experiments.

Apr 21
Wei Kuang, PhD Student

Student Seminars: Wei Kuang

3:00–5:00 pm Cobb 203

Friday, April 18, 2025, at 3:00 PM, in Cobb 203, 5811 S. Ellis Avenue
PhD Dissertation Defense Presentation
Wei Kuang, Department of Statistics, The University of Chicago
“Estimation Using Second-Order Methods”

Apr 18

Student Seminar: Oscar Liu

2:00–3:00 pm Ryerson 176

Friday, April 18, 2025, at 2:00 PM, in Ryerson 176, 1100 E 58th St.
Master’s Thesis l Presentation
Oscar Liu, Department of Statistics, The University of Chicago
“Bias Correction of Ground Temperature in Hawaii Using Gaussian Process Models”

Apr 18

Student Seminar: Zihao Wang

1:00–3:00 pm Jones 226

Friday, April 18, 2025, at 1:00 PM, in Jones 226, 5747 S. Ellis Avenue
PhD Dissertation Defense Presentation
Zihao Wang, Department of Statistics, The University of Chicago
“Understanding and Steering Large Generative Models: From Representation Geometry to Stress-Testing Generative Behavior”

Apr 18
YoonHaeng Hur, PhD Student

Student Seminars: YoonHaeng Hur

10:00 am–12:00 pm Ryerson 255

Friday, April 18, 2025, at 10:00 AM, in Ryerson 255, 1100 E 58th St.
PhD Dissertation Defense Presentation
YoonHaeng Hur, Department of Statistics, The University of Chicago
“Infinite-Dimensional Inference and Learning via Optimal Transport”

Apr 18

Student Seminar: Bryce Jiang

10:30–11:00 am Jones 303

Wednesday, April 16, 2025, at 10:30 AM, in Jones 303, 5747 S. Ellis Avenue
Master’s Thesis l Presentation
Bryce Jiang, Department of Statistics, The University of Chicago
“Supermask-enhanced Foundation Model for Continual Time Series Forecasting”

Apr 16

Student Seminar: Xinyi Wang

2:30–3:00 pm Jones 111

Tuesday, April 15, 2025, at 2:30 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis l Presentation
Xinyi Wang, Department of Statistics, The University of Chicago
“Robust Empirical Bayesian Hierarchical Model for Covariate Adjustment in Stratified Randomized Experiments”

Apr 15

Student Seminar: Wanyi Ling

2:00–2:30 pm Jones 111

Tuesday, April 15, 2025, at 2:00 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis l Presentation
Wanyi Ling, Department of Statistics, The University of Chicago
“An empirical partially Bayes method for adjusting batch effects”

Apr 15

Statistics Colloquium: Ashwin Pananjady

11:30 am–12:30 pm Jones 303

Ashwin Pananjady, H. Milton Stewart School of Industrial and Systems Engineering/The School of Electrical and Computer Engineering, Georgia Institute of Technology

Title: Predicting the behavior of complex iterative algorithms with random data

Abstract: Iterative algorithms are the workhorses of modern statistical signal processing and machine learning. Algorithm design and analysis is largely based on variational properties of the optimization problem, and the classical focus has been on obtaining convergence guarantees over classes of problems that possess certain types of geometry. However, modern optimization problems in statistical settings are high-dimensional and involve random data, and algorithms often behave differently from what is suggested by classical theory. With the motivation of better understanding optimization in such settings, I will present a toolbox for deriving “state evolutions” for a wide variety of algorithms with random data. These are non-asymptotic, near-exact predictions of the statistical behavior of the algorithm, which apply even when the underlying optimization problem is nonconvex or the algorithm is randomly initialized. We will showcase these predictions on deterministic and stochastic variants of complex algorithms employed in some canonical statistical models.

Apr 14

Student Seminar: Sean Richardson

12:30–1:00 pm Jones 111

Thursday, April 10, 2025, at 12:30 PM, in Jones 111, 5747 S. Ellis Avenue
Master’s Thesis l Presentation
Sean Richardson, Department of Statistics, The University of Chicago
“Causal Evaluation of Black-Box Reward Models via Textual Interventions”

Apr 10