
Statistics Colloquium: Marco Avella Medina
11:30 am–12:30 pm Jones 303
Marco Avella Medina
Department of Statistics
Columbia University
Title: A Theoretical Framework for M-Posteriors: Frequentist Guarantees and Robustness Properties
Abstract: We provide a theoretical framework for a wide class of generalized posteriors that can be viewed as the natural Bayesian posterior counterpart of the class of M-estimators in the frequentist world. We call the members of this class M-posteriors and show that they are asymptotically normally distributed under mild conditions on the M-estimation loss and the prior. In particular, an M-posterior contracts in probability around a normal distribution centered at an M-estimator, showing frequentist consistency and suggesting some degree of robustness depending on the reference M-estimator. We formalize the robustness properties of the M-posteriors by a new characterization of the posterior influence function and a novel definition of breakdown point adapted for posterior distributions. We illustrate the wide applicability of our theory in various popular models and discuss extensions to variational inference. We illustrate their empirical relevance of our results in some numerical examples.
This is based on joint work with Juraj Marusic and Cynthia Rush
DSI Distinguished Speaker Series: Lillian Lee
12:00–1:30 pm DSI 105
Lillian Lee
Charles Roy Davis Professor of Computer Science
Cornell University
Title: Taking a turn for the better? Pivoting and pivotal moments in consequential conversations
Abstract: So much of human interaction occurs as conversations, and it is both fascinating and imperative to analyze them. Recently, my co-authors and I have turned to texting-based conversations between mental-health therapists or crisis counselors and their clients, seeking to identify “key” moments in these exchanges:
(1) A “pivoting” moment corresponds to a *redirection* of the conversation introduced by one party that is accepted/followed by the other. We develop a probabilistic measure of how much an utterance immediately redirects the flow of the conversation, accounting for both the intention and the actual realization of such a change.
(2) In a *pivotal* moment, the conversation’s outcome hangs in the balance: how one responds can put the conversation on substantially diverging trajectories leading to significantly different results. We formalize this intuition by estimating the variance in expectation of outcome depending on what might be said next.
We find significant correlates of our measures in real human conversations on widely-used platforms. For example, the patients in our longer-term mental-health-therapy data who redirected less in their first few sessions were significantly more likely to eventually express dissatisfaction with their therapist and terminate the relationship; and the staff responses in our crisis-counseling data had greater estimated impact on disengagement rates during pivotal moments than in non-.
Joint work with Vivian Nguyen, Cristian Danescu-Niculescu-Mizil, Thomas D. Hull, and Sang Min (Dave) Jung.

Statistics Colloquium: Yiqiao Zhong
11:30 am–12:30 pm Jones 303
Yiqiao Zhong
Department of Statistics
University of Wisconsin-Madison
Title: Compositionality in Large Language Models: Emergence, Generalization, and Geometry
Abstract: Large language models (LLMs) have demonstrated remarkable reasoning abilities through novel techniques such as in-context learning and chain-of-thought (CoT) reasoning. Empirically, key reasoning skills often emerge only at larger scales or after prolonged training. Yet the underlying mechanism of LLM reasoning—-how compositional representations are formed and organized—-remains poorly understood.
In this talk, I present recent progress toward uncovering emergent compositional structure through controlled synthetic experiments on small transformers and targeted intervention studies on modern LLMs. First, I show that learning a key compositional structure is essential for out-of-distribution generalization, and that this process undergoes sharp phase transitions during training. At a critical stage, an intermediate low-dimensional “bridge subspace” emerges, serving as a shared representation connecting multiple layers. Second, using arithmetic composition as a minimal testbed for CoT reasoning, I demonstrate that autoregressive training on reasoning traces exhibits distinct reasoning phases. In particular, causally faithful reasoning emerges only when training noise lies below a critical threshold.
Together, these findings suggest that core statistical principles such as low-dimensional subspaces and causality may provide key foundations for advancing the interpretability and transparency of LLMs.

Statistics Colloquium: Stefan Wager
11:30 am–12:30 pm Jones 303
Stefan Wager
Department of Statistics
Stanford University
Title: TBA
Abstract: TBA

Statistics Colloquium: Aravindan Vijayaraghavan
11:30 am–12:30 pm Jones 303
Aravindan Vijayaraghavan
Department of Computer Science
Northwestern University
Title: TBA
Abstract: TBA

Statistics Colloquium: David Blei
11:30 am–12:30 pm Jones 303
David Blei
Departments of Statistics and Computer Science
Columbia University
Title: TBA
Abstract: TBA