Deep learning and operator-valued free probability: training and generalization dynamics in high dimensions
ITS Seminar with Jeffrey Pennington (Google Brain)
One of the distinguishing characteristics of modern deep learning systems is that they typically employ neural network architectures that utilize enormous numbers of parameters, often in the millions and sometimes even in the billions. While this paradigm has recently inspired a broad research effort on the properties of large networks, relatively little work has been devoted to the fact that these networks are often used to model large complex datasets, which may themselves contain millions or even billions of constraints. In this talk, I will present a formalism based on operator-valued free probability that enables exact predictions of training and generalization performance in the high-dimensional regime in which both the dataset size and the number of features tend to infinity. The analysis provides one of the first analytically tractable models that captures the effects of early stopping, over/under-parameterization, explicit regularization, and which exhibits the characteristic double-descent curve.