Pierfrancesco Urbani, Université Paris-Saclay, CNRS, CEA, IPhT: Separation of timescales controls feature learning and overfitting in large neural networks

April

13:15 - 14:00

Add to Outlook Add to Google Calendar

Overview

The event has passed

Date:
Starts 14 April 2026, 13:15Ends 14 April 2026, 14:00
Location:
MV:L15, Chalmers tvärgata 3
Language:
English

Abstract: To understand the inductive bias and generalization capabilities of large, overparameterized machine learning models, it is essential to analyze the dynamics of their training algorithms. Using dynamical mean field theory we investigate the learning dynamics of large two-layer neural networks. Our findings reveal that, for networks with a large width, the training process exhibits a separation of timescales phenomenon. This leads to several key observations:

The emergence of a slow timescale linked to the growth in Gaussian/Rademacher complexity of the network;
An inductive bias favoring low complexity when the initial model complexity is sufficiently small;
A dynamical decoupling between feature learning and overfitting phases;
A non-monotonic trend in test error, characterized by a “feature unlearning” regime at later stages of training.

Joint work with Andrea Montanari.

To personal page

Jan Gerken

Assistant Professor, Algebra and Geometry, Mathematical Sciences

Contact

Seminar Geometry, Algebra and Physics in Deep Neural Networks (GAPinDNNs)

Overview

Date:

Location:

Language: