1:30–5:30 pm
Stevanovich Center
MS 112
5727 S. University Avenue
Chicago, Illinois 60637
Lectures 3 & 4: Scalable kernel methods
Speaker: David Bindel, Cornell University
Wednesday, June 19, 2019, at 1:30-2:30 PM and 3:00-4:00 PM
Kernel methods are used throughout statistical modeling, data science, and approximation theory. Depending on the community, they may be introduced in many different ways: through dot products of feature maps, through data-adapted basis functions in an interpolation space, through the natural structure of a reproducing kernel Hilbert space, or through the covariance structure of a Gaussian process. We describe these various interpretations and their relation to each other, and then turn to the key computational bottleneck for all kernel methods: the solution of linear systems and the computation of (log) determinants for dense matrices whose size scales with the number of examples. Recent developments in linear algebra make it increasingly feasible to solve these problems efficiently even with millions of data points. We discuss some of these techniques, including rank-structured factorization, structured kernel interpolation, and stochastic estimators for determinants and their derivatives. We also give a perspective on some open problems and on approaches to addressing the constant challenge posed by the curse of dimensionality.
Seminar II: Empirical risk minimization over deep neural networks overcomes the curse of dimensionality in the numerical approximation of Kolmogorov PDEs
Speaker: Julius Berner, University of Vienna
Wednesday, June 19, 2019, at 4:30-5:30 PM
Recently, methods based on empirical risk minimization (ERM) over deep neural network hypothesis classes have been applied to the numerical solution of PDEs with great success. We consider under which conditions ERM over a neural network hypothesis class approximates, with high probability, the solution of a d-dimensional Kolmogorov PDE with affine drift and diffusion coefficients up to error e. We establish that such an approximation can be achieved with both the size of the hypothesis class and the number of training samples scaling only polynomially in d and 1/e.