Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks

Luca Di Carlo; Chase Goddard; David J. Schwab

Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks

Luca Di Carlo, Chase Goddard, David J. Schwab

TL;DR

The study tackles the puzzle of why minima connected by low-loss paths behave as if in a connected valley while SGD remains localized. It introduces curvature-induced entropic forces arising from SGD noise that bias dynamics toward flatter regions, formalized via an effective potential $V_{eff}(y) = T \ln g(y)$. Through AutoNEB-generated minimum-energy paths between CIFAR-10 minima and a suite of curvature diagnostics, the authors show a systematic rise in curvature away from endpoints, creating entropic barriers that persist longer than energetic barriers and drive late-stage localization. These findings refine the valley metaphor into a curvature-modulated landscape and have implications for linear mode connectivity, generalization, and ensembling techniques like SWA.

Abstract

Modern neural networks exhibit a striking property: basins of attraction in the loss landscape are often connected by low-loss paths, yet optimization dynamics generally remain confined to a single convex basin and rarely explore intermediate points. We resolve this paradox by identifying entropic barriers arising from the interplay between curvature variations along these paths and noise in optimization dynamics. Empirically, we find that curvature systematically rises away from minima, producing effective forces that bias noisy dynamics back toward the endpoints - even when the loss remains nearly flat. These barriers persist longer than energetic barriers, shaping the late-time localization of solutions in parameter space. Our results highlight the role of curvature-induced entropic forces in governing both connectivity and confinement in deep learning landscapes.

Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks

TL;DR

Abstract

Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)