A reading group on why modern neural networks generalize despite overparameterization, interpolation, and their ability to fit random labels.
This reading group asks why modern neural networks generalize despite being highly overparameterized, capable of fitting random labels, and often trained far beyond interpolation. We follow the story from classical statistical learning theory to the empirical puzzles that challenged it, including double descent and benign overfitting, then examine proposed explanations based on implicit bias, Bayesian and PAC-Bayes perspectives, compression, grokking, and sparse subnetworks.
Across the sessions, the guiding question is: which parts of deep-learning generalization are now understood, and which phenomena still require new theory?
Schedule and readings may change as the group evolves. Slides and recordings will be added when available.
| Date | Topic | Readings | Slides | Recording |
|---|---|---|---|---|
| 03 Jun 2026 | Classical theory of generalization VC dimension, Rademacher complexity, uniform convergence. |
Understanding Machine Learning, Part I selections. | Slides | - |
| 10 Jun 2026 | Understanding Deep Learning Requires Rethinking Generalization Random labels, interpolation, and whether VC dimension is the wrong notion of complexity. |
Zhang et al., 2017 | Slides | - |
| 17 Jun 2026 | Double Descent and Benign Overfitting Why can test error decrease again after interpolation? |
Reconciling modern machine-learning practice and the classical bias-variance trade-off | Slides | - |
| 24 Jun 2026 | Implicit Bias and Soft Inductive Biases Implicit regularization, SGD bias, simplicity preferences, and soft inductive biases. |
Implicit Bias of Gradient Descent on Separable Data Bayesian Deep Learning and a Probabilistic Perspective of Generalization Averaging Weights Lead to Wider Optima and Better Generalization |
TBA | TBA |
| 01 Jul 2026 | Tentative invited talk Do LLMs actually generalize? PAC/VC bounds, overparameterization, compression, simplicity, and LLM generalization. |
Non-Vacuous Generalization Bounds for Large Language Models Unlocking tokens as data points for generalization bounds on larger language models |
TBA | TBA |
| 08 Jul 2026 | Deep Learning Is Not So Mysterious or Different Which mysteries disappear under PAC-Bayes/compression viewpoints, and which remain? |
Deep Learning is Not So Mysterious or Different | TBA | TBA |
| 15 Jul 2026 | Open discussion Grokking: Generalization Beyond Overfitting Does grokking support or challenge the view that deep learning is not so mysterious? |
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets Grokking as the Transition from Lazy to Rich Training Dynamics |
TBA | TBA |
| 22 Jul 2026 | Tentative invited talk Lottery Ticket Hypothesis Do neural networks succeed because training discovers sparse winning tickets? |
The Lottery Ticket Hypothesis | TBA | TBA |
Organized by Abir Harrasse at the Jinesis AI Lab, University of Toronto.