Events: Statistics Colloquium

Joint Statistics and DSI Colloquium: Tudor Manole

11:30 am–12:30 pm DSI 105

Tudor Manole
Institute for Data, Systems, and Society
Massachusetts Institute of Technology

Title: “A Statistical Framework for Benchmarking Quantum Computers”

Abstract: Recent years have witnessed quantum computing technologies increasingly move from theoretical proposals to functioning experimental platforms, reaching major milestones such as the demonstration of beyond-classical computational tasks. Despite these exciting advances, current quantum computers experience hardware-level errors which limit their scalability, and which must be carefully identified before they can be mitigated. In this talk, I will develop a statistical framework for characterizing errors in quantum devices, using an existing experimental platform known as random circuit sampling. Data arising from this experiment can be described through a high-dimensional discrete latent variable model parametrized by the device’s error rates. We develop estimators for these error rates which are provably consistent even for large-scale quantum devices. We then apply our methods to benchmark a recent state-of-the-art quantum processor, obtaining a detailed report of error rates which were largely unavailable from past studies. I will close by placing these results in the broader context of my interdisciplinary work in the physical sciences, and by discussing some of my other research interests in nonparametric statistics and statistical optimal transport.

Jan 12

CAM Colloquium: Yijun Dong

4:00–5:00 pm Jones 303

Yijun Dong
Courant Institute of Mathematical Sciences
New York University

Title: “Understanding Post-training through the Lens of Intrinsic Dimension.”

Abstract: Post-training is becoming the primary interface between powerful pre-trained models and challenging real-world problems, where we aim to adapt large pre-trained models via limited, heterogeneous data while preserving their capabilities and reliability. In this talk, we introduce a step toward a unified theoretical and algorithmic framework for post-training through the lens of intrinsic dimensions. In particular, we focus on an emerging post-training phenomenon, weak-to-strong (W2S) generalization, in which a strong pre-trained student model fine-tuned only with supervision from a weaker teacher model can often outperform its teacher. Theoretically, we explain when and why W2S generalization occurs from a sample-efficiency perspective, reveal the value of teacher-student discrepancy for W2S, and investigate the effects of systematic biases on W2S. Algorithmically, we propose a practical, theory-inspired remedy for W2S under spurious correlation. The talk will conclude with an outlook on the broad applications of random matrix tools for understanding and improving post-training.

Jan 15

CAM Colloquium: Eitan Levin

4:00–5:00 pm Jones 303

Eitan Levin
Applied and Computational Mathematics
California Institute of Technology

Title: “Any-Dimensional Data Science”

Abstract: Many applications throughout data science require methods that are well-defined and performant for problems or data of any size.  In machine learning, we are given training data from which we wish to learn algorithms capable of solving problems of any size.  In particular, the learned algorithm must generalize to inputs of sizes that are not present in the training set.  For example, algorithms for processing graphs or point clouds must generalize to inputs with any number of nodes or points.  A second challenge pertaining to any-dimensionality arises in applications such as game theory or network statistics in which we wish to characterize solutions to problems of growing size.  Examples include computing values of games with any number of players, or proving moment inequalities for random vectors and graphs of any size.  From an optimization perspective, this amounts to deriving bounds that hold for entire sequences of problems of growing dimensionality.  Finally, in applications involving graph-valued data, we wish to produce constant-sized summaries of arbitrarily-large networks that preserve their essential structural properties.  These summaries can then be used for efficiently testing properties of the underlying large network, e.g., testing for the presence of hubs is of interest in massive biological and traffic networks.  We develop a unified framework to tackle such any-dimensional problems by using random sampling maps to compare and summarize objects of different sizes.  Our methodology leverages new de Finetti-type theorems and the recently-identified phenomenon of representation stability.  We illustrate the resulting framework for any-dimensional problems in several applications. 

Jan 22

Statistics Colloquium: Sungwoo Jeong

11:30 am–12:30 pm Jones 303

Sungwoo Jeong
Department of Mathematics
Cornell University

Title: “Convergence of Two Kernel Algorithms: Continuous Analogues of the SVD and the Cholesky Decomposition”

Abstract: Kernels have been shown to be effective in numerous applications. We discuss some misconceptions and facts about why kernels are powerful in practice. To investigate this, we consider two expansions that encode fundamental kernel structures: the kernel analogue of the SVD and the Cholesky decomposition.

First, the convergence of the kernel SVD (SVE) is equivalent to the existence of a corresponding functional space such as the reproducing kernel Hilbert space (RKHS). For general kernels, such as self-attention in neural networks, it is still unclear whether the SVE converges. We prove a surprising result showing that kernel continuity alone is not enough to guarantee this convergence. At the same time, we provide a new sufficient condition for convergence that helps explain why kernels work well in practice.

The kernel Cholesky algorithm is another fundamental tool used in applications such as Gaussian process regression (Bayesian inference on functions). While it is empirically observed that the Cholesky algorithm converges for smooth kernels, no rigorous result exists for kernels with weaker regularity than $C^2$. We prove a new convergence result for Lipschitz continuous kernels, together with an explicit convergence rate that sharply agrees with what is observed in practice.

Jan 26

CAM Colloquium: Yuanzhao Zhang

4:00–5:00 pm Jones 303

Yuanzhao Zhang
Sante Fe Institute

Title: “Physics-uninformed machine learning”

Abstract: How much can we learn about a complex system when governing equations are unknown and data are scarce? This talk explores how modern machine-learning models can extrapolate beyond limited training data without physics priors. First, I will show that simple recurrent neural networks with no built-in physics can unexpectedly reconstruct basins of attraction in multistable systems, even for basins not seen during training. This unexpected generalization capability raises questions about what hidden inductive biases are implicitly regularizing the model. Second, I will examine time-series foundation models and their ability to forecast entirely new dynamical systems from a short context trajectory. These models often achieve strong zero-shot performance in forecasting chaotic systems by exploiting a strategy we term context parroting. Analyzing the parroting strategy provides insights into the capabilities of current foundation models and explains observed neural scaling laws. Exploring strategies beyond parroting may further reveal how both artificial and natural intelligence extract information from limited data.

Jan 29