
Joint Statistics and DSI Colloquium: Tudor Manole
11:30 am–12:30 pm DSI 105
Tudor Manole
Institute for Data, Systems, and Society
Massachusetts Institute of Technology
Title: “A Statistical Framework for Benchmarking Quantum Computers”
Abstract: Recent years have witnessed quantum computing technologies increasingly move from theoretical proposals to functioning experimental platforms, reaching major milestones such as the demonstration of beyond-classical computational tasks. Despite these exciting advances, current quantum computers experience hardware-level errors which limit their scalability, and which must be carefully identified before they can be mitigated. In this talk, I will develop a statistical framework for characterizing errors in quantum devices, using an existing experimental platform known as random circuit sampling. Data arising from this experiment can be described through a high-dimensional discrete latent variable model parametrized by the device’s error rates. We develop estimators for these error rates which are provably consistent even for large-scale quantum devices. We then apply our methods to benchmark a recent state-of-the-art quantum processor, obtaining a detailed report of error rates which were largely unavailable from past studies. I will close by placing these results in the broader context of my interdisciplinary work in the physical sciences, and by discussing some of my other research interests in nonparametric statistics and statistical optimal transport.

CAM Colloquium: Yijun Dong
4:00–5:00 pm Jones 303
Yijun Dong
Courant Institute of Mathematical Sciences
New York University
Title: “Understanding Post-training through the Lens of Intrinsic Dimension.”
Abstract: Post-training is becoming the primary interface between powerful pre-trained models and challenging real-world problems, where we aim to adapt large pre-trained models via limited, heterogeneous data while preserving their capabilities and reliability. In this talk, we introduce a step toward a unified theoretical and algorithmic framework for post-training through the lens of intrinsic dimensions. In particular, we focus on an emerging post-training phenomenon, weak-to-strong (W2S) generalization, in which a strong pre-trained student model fine-tuned only with supervision from a weaker teacher model can often outperform its teacher. Theoretically, we explain when and why W2S generalization occurs from a sample-efficiency perspective, reveal the value of teacher-student discrepancy for W2S, and investigate the effects of systematic biases on W2S. Algorithmically, we propose a practical, theory-inspired remedy for W2S under spurious correlation. The talk will conclude with an outlook on the broad applications of random matrix tools for understanding and improving post-training.

CAM Colloquium: Eitan Levin
4:00–5:00 pm Jones 303
Eitan Levin
Applied and Computational Mathematics
California Institute of Technology
Title: “Any-Dimensional Data Science”
Abstract: Many applications throughout data science require methods that are well-defined and performant for problems or data of any size. In machine learning, we are given training data from which we wish to learn algorithms capable of solving problems of any size. In particular, the learned algorithm must generalize to inputs of sizes that are not present in the training set. For example, algorithms for processing graphs or point clouds must generalize to inputs with any number of nodes or points. A second challenge pertaining to any-dimensionality arises in applications such as game theory or network statistics in which we wish to characterize solutions to problems of growing size. Examples include computing values of games with any number of players, or proving moment inequalities for random vectors and graphs of any size. From an optimization perspective, this amounts to deriving bounds that hold for entire sequences of problems of growing dimensionality. Finally, in applications involving graph-valued data, we wish to produce constant-sized summaries of arbitrarily-large networks that preserve their essential structural properties. These summaries can then be used for efficiently testing properties of the underlying large network, e.g., testing for the presence of hubs is of interest in massive biological and traffic networks. We develop a unified framework to tackle such any-dimensional problems by using random sampling maps to compare and summarize objects of different sizes. Our methodology leverages new de Finetti-type theorems and the recently-identified phenomenon of representation stability. We illustrate the resulting framework for any-dimensional problems in several applications.

Statistics Colloquium: Sungwoo Jeong
11:30 am–12:30 pm Jones 303
Sungwoo Jeong
Department of Mathematics
Cornell University
Title: TBA
Abstract: TBA

CAM Colloquium: Yuanzhao Zhang
4:00–5:00 pm Jones 303
Yuanzhao Zhang
Sante Fe Institute
Title: TBA
Abstract: TBA