Events: Lectures

Joint Statistics and DSI Colloquium: Tudor Manole

11:30 am–12:30 pm DSI 105

Tudor Manole
Institute for Data, Systems, and Society
Massachusetts Institute of Technology

Title: “A Statistical Framework for Benchmarking Quantum Computers”

Abstract: Recent years have witnessed quantum computing technologies increasingly move from theoretical proposals to functioning experimental platforms, reaching major milestones such as the demonstration of beyond-classical computational tasks. Despite these exciting advances, current quantum computers experience hardware-level errors which limit their scalability, and which must be carefully identified before they can be mitigated. In this talk, I will develop a statistical framework for characterizing errors in quantum devices, using an existing experimental platform known as random circuit sampling. Data arising from this experiment can be described through a high-dimensional discrete latent variable model parametrized by the device’s error rates. We develop estimators for these error rates which are provably consistent even for large-scale quantum devices. We then apply our methods to benchmark a recent state-of-the-art quantum processor, obtaining a detailed report of error rates which were largely unavailable from past studies. I will close by placing these results in the broader context of my interdisciplinary work in the physical sciences, and by discussing some of my other research interests in nonparametric statistics and statistical optimal transport.

Jan 12

CAM Colloquium: Yijun Dong

4:00–5:00 pm Jones 303

Yijun Dong
Courant Institute of Mathematical Sciences
New York University

Title: “Understanding Post-training through the Lens of Intrinsic Dimension.”

Abstract: Post-training is becoming the primary interface between powerful pre-trained models and challenging real-world problems, where we aim to adapt large pre-trained models via limited, heterogeneous data while preserving their capabilities and reliability. In this talk, we introduce a step toward a unified theoretical and algorithmic framework for post-training through the lens of intrinsic dimensions. In particular, we focus on an emerging post-training phenomenon, weak-to-strong (W2S) generalization, in which a strong pre-trained student model fine-tuned only with supervision from a weaker teacher model can often outperform its teacher. Theoretically, we explain when and why W2S generalization occurs from a sample-efficiency perspective, reveal the value of teacher-student discrepancy for W2S, and investigate the effects of systematic biases on W2S. Algorithmically, we propose a practical, theory-inspired remedy for W2S under spurious correlation. The talk will conclude with an outlook on the broad applications of random matrix tools for understanding and improving post-training.

Jan 15

CAM Colloquium: Eitan Levin

4:00–5:00 pm Jones 303

Eitan Levin
Applied and Computational Mathematics
California Institute of Technology

Title: “Any-Dimensional Data Science”

Abstract: Many applications throughout data science require methods that are well-defined and performant for problems or data of any size.  In machine learning, we are given training data from which we wish to learn algorithms capable of solving problems of any size.  In particular, the learned algorithm must generalize to inputs of sizes that are not present in the training set.  For example, algorithms for processing graphs or point clouds must generalize to inputs with any number of nodes or points.  A second challenge pertaining to any-dimensionality arises in applications such as game theory or network statistics in which we wish to characterize solutions to problems of growing size.  Examples include computing values of games with any number of players, or proving moment inequalities for random vectors and graphs of any size.  From an optimization perspective, this amounts to deriving bounds that hold for entire sequences of problems of growing dimensionality.  Finally, in applications involving graph-valued data, we wish to produce constant-sized summaries of arbitrarily-large networks that preserve their essential structural properties.  These summaries can then be used for efficiently testing properties of the underlying large network, e.g., testing for the presence of hubs is of interest in massive biological and traffic networks.  We develop a unified framework to tackle such any-dimensional problems by using random sampling maps to compare and summarize objects of different sizes.  Our methodology leverages new de Finetti-type theorems and the recently-identified phenomenon of representation stability.  We illustrate the resulting framework for any-dimensional problems in several applications. 

Jan 22

Statistics Colloquium: Sungwoo Jeong

11:30 am–12:30 pm Jones 303

Sungwoo Jeong
Department of Mathematics
Cornell University

Title: TBA

Abstract: TBA

Jan 26

CAM Colloquium: Yuanzhao Zhang

4:00–5:00 pm Jones 303

Yuanzhao Zhang
Sante Fe Institute

Title: TBA

Abstract: TBA

Jan 29