Reading Group · Summer 2026

Theory of Generalization Reading Group

A reading group on why modern neural networks generalize despite overparameterization, interpolation, and their ability to fit random labels.

Schedule Organizer Back to homepage

Overview

This reading group asks why modern neural networks generalize despite being highly overparameterized, capable of fitting random labels, and often trained far beyond interpolation. We follow the story from classical statistical learning theory to the empirical puzzles that challenged it, including double descent and benign overfitting, then examine proposed explanations based on implicit bias, Bayesian and PAC-Bayes perspectives, compression, grokking, and sparse subnetworks.

Across the sessions, the guiding question is: which parts of deep-learning generalization are now understood, and which phenomena still require new theory?

Schedule

Schedule and readings may change as the group evolves. Slides will be added when available.

Date	Topic	Readings	Slides
03 Jun 2026	Classical theory of generalization VC dimension, Rademacher complexity, uniform convergence.	Understanding Machine Learning, Part I selections.	Slides
10 Jun 2026	Understanding Deep Learning Requires Rethinking Generalization Random labels, interpolation, and whether VC dimension is the wrong notion of complexity.	Zhang et al., 2017	Slides
17 Jun 2026	Double Descent and Benign Overfitting Why can test error decrease again after interpolation?	Reconciling modern machine-learning practice and the classical bias-variance trade-off	Slides
24 Jun 2026	Implicit Bias and Soft Inductive Biases Implicit regularization, SGD bias, simplicity preferences, and soft inductive biases.	Implicit Bias of Gradient Descent on Separable Data Bayesian Deep Learning and a Probabilistic Perspective of Generalization Averaging Weights Lead to Wider Optima and Better Generalization	TBA
03 Jul 2026	Compression, Nonuniform Learnability, and PAC-Bayes Can compression and PAC-Bayes explain why deep networks generalize? Simple functions can occupy exponentially larger regions of parameter space, making them easier for learning algorithms to find and providing a compelling account of generalization in DNNs.	A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks	Slides

Date

Topic

Readings

Slides

03 Jun 2026

Classical theory of generalization

VC dimension, Rademacher complexity, uniform convergence.

Understanding Machine Learning, Part I selections.

Slides

10 Jun 2026

Understanding Deep Learning Requires Rethinking Generalization

Random labels, interpolation, and whether VC dimension is the wrong notion of complexity.

Zhang et al., 2017

Slides

17 Jun 2026

Double Descent and Benign Overfitting

Why can test error decrease again after interpolation?

Reconciling modern machine-learning practice and the classical bias-variance trade-off

Slides

24 Jun 2026

Implicit Bias and Soft Inductive Biases

Implicit regularization, SGD bias, simplicity preferences, and soft inductive biases.

Implicit Bias of Gradient Descent on Separable Data
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
Averaging Weights Lead to Wider Optima and Better Generalization

TBA

03 Jul 2026

Compression, Nonuniform Learnability, and PAC-Bayes

Can compression and PAC-Bayes explain why deep networks generalize? Simple functions can occupy exponentially larger regions of parameter space, making them easier for learning algorithms to find and providing a compelling account of generalization in DNNs.

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

Slides