Reading Group · Summer 2026

Theory of Generalization Reading Group

A reading group on why modern neural networks generalize despite overparameterization, interpolation, and their ability to fit random labels.

Overview

This reading group asks why modern neural networks generalize despite being highly overparameterized, capable of fitting random labels, and often trained far beyond interpolation. We follow the story from classical statistical learning theory to the empirical puzzles that challenged it, including double descent and benign overfitting, then examine proposed explanations based on implicit bias, Bayesian and PAC-Bayes perspectives, compression, grokking, and sparse subnetworks.

Across the sessions, the guiding question is: which parts of deep-learning generalization are now understood, and which phenomena still require new theory?

Schedule

Next topic
Implicit Bias and Soft Inductive Biases
24 Jun 2026 · Implicit regularization, SGD bias, simplicity preferences, and soft inductive biases.

Schedule and readings may change as the group evolves. Slides and recordings will be added when available.

DateTopicReadingsSlidesRecording
03 Jun 2026
Classical theory of generalization
VC dimension, Rademacher complexity, uniform convergence.
Understanding Machine Learning, Part I selections. Slides -
10 Jun 2026
Understanding Deep Learning Requires Rethinking Generalization
Random labels, interpolation, and whether VC dimension is the wrong notion of complexity.
Zhang et al., 2017 Slides -
17 Jun 2026
Double Descent and Benign Overfitting
Why can test error decrease again after interpolation?
Reconciling modern machine-learning practice and the classical bias-variance trade-off Slides -
24 Jun 2026
Implicit Bias and Soft Inductive Biases
Implicit regularization, SGD bias, simplicity preferences, and soft inductive biases.
Implicit Bias of Gradient Descent on Separable Data
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
Averaging Weights Lead to Wider Optima and Better Generalization
TBA TBA
01 Jul 2026 Tentative invited talk
Do LLMs actually generalize?
PAC/VC bounds, overparameterization, compression, simplicity, and LLM generalization.
Non-Vacuous Generalization Bounds for Large Language Models
Unlocking tokens as data points for generalization bounds on larger language models
TBA TBA
08 Jul 2026
Deep Learning Is Not So Mysterious or Different
Which mysteries disappear under PAC-Bayes/compression viewpoints, and which remain?
Deep Learning is Not So Mysterious or Different TBA TBA
15 Jul 2026 Open discussion
Grokking: Generalization Beyond Overfitting
Does grokking support or challenge the view that deep learning is not so mysterious?
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Grokking as the Transition from Lazy to Rich Training Dynamics
TBA TBA
22 Jul 2026 Tentative invited talk
Lottery Ticket Hypothesis
Do neural networks succeed because training discovers sparse winning tickets?
The Lottery Ticket Hypothesis TBA TBA

Organizer

Organized by Abir Harrasse at the Jinesis AI Lab, University of Toronto.