
Statistics Colloquium: Cynthia Rudin
11:30 am–12:30 pm Jone 303
Cynthia Rudin
Department of Computer Science
Duke University
Title: TBA
Abstract: TBA

Statistics Colloquium: Csaba Szepesvari
11:30 am–12:30 pm Jones 303
Csaba Szepesvari
Department of Computing Science
University of Alberta
Title: TBA
Abstract: TBA

Statistics Colloquium: Marco Avella Medina
11:30 am–12:30 pm Jones 303
Marco Avella Medina
Department of Statistics
Columbia University
Title:
Abstract:

Statistics Colloquium: Michael Sobel
11:30 am–12:30 pm Jones 303
Michael Sobel
Department of Statistics
Columbia University
Title: TBA
Abstract: TBA

Statistics Colloquium: Yiqiao Zhong
11:30 am–12:30 pm Jones 303
Yiqiao Zhong
Department of Statistics
University of Wisconsin-Madison
Title: Compositionality in Large Language Models: Emergence, Generalization, and Geometry
Abstract: Large language models (LLMs) have demonstrated remarkable reasoning abilities through novel techniques such as in-context learning and chain-of-thought (CoT) reasoning. Empirically, key reasoning skills often emerge only at larger scales or after prolonged training. Yet the underlying mechanism of LLM reasoning—-how compositional representations are formed and organized—-remains poorly understood.
In this talk, I present recent progress toward uncovering emergent compositional structure through controlled synthetic experiments on small transformers and targeted intervention studies on modern LLMs. First, I show that learning a key compositional structure is essential for out-of-distribution generalization, and that this process undergoes sharp phase transitions during training. At a critical stage, an intermediate low-dimensional “bridge subspace” emerges, serving as a shared representation connecting multiple layers. Second, using arithmetic composition as a minimal testbed for CoT reasoning, I demonstrate that autoregressive training on reasoning traces exhibits distinct reasoning phases. In particular, causally faithful reasoning emerges only when training noise lies below a critical threshold.
Together, these findings suggest that core statistical principles such as low-dimensional subspaces and causality may provide key foundations for advancing the interpretability and transparency of LLMs.

Statistics Colloquium: Stefan Wager
11:30 am–12:30 pm Jones 303
Stefan Wager
Department of Statistics
Stanford University
Title: TBA
Abstract: TBA

Statistics Colloquium: Aravindan Vijayaraghavan
11:30 am–12:30 pm Jones 303
Aravindan Vijayaraghavan
Department of Computer Science
Northwestern University
Title: TBA
Abstract: TBA

Statistics Colloquium: David Blei
11:30 am–12:30 pm Jones 303
David Blei
Departments of Statistics and Computer Science
Columbia University
Title: TBA
Abstract: TBA