Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms
Rayna Andreeva, Benjamin Dupuis, Rik Sarkar, Tolga Birdal, Umut Şimşekli
TL;DR
The paper develops a discrete-time, topology-based framework to bound the generalization error of stochastic optimization algorithms by introducing alpha-weighted lifetime sums ($\boldsymbol{E}_\alpha$) and positive magnitude ($\mathrm{\mathbf{PMag}}$). These topological complexities, together with a total mutual information term $\mathrm{I}_\infty$, yield non-asymptotic generalization bounds that apply to practical, finite training trajectories. The authors provide computable, data-driven procedures to estimate these complexities and demonstrate strong correlations with generalization across vision transformers and graph neural networks, often outperforming prior persistent-homology bounds. The work offers a scalable, domain-agnostic toolkit for assessing generalization risk in discrete-time DNN training and sets the stage for broader applications and refinements in topology-guided learning theory.
Abstract
We present a novel set of rigorous and computationally efficient topology-based complexity notions that exhibit a strong correlation with the generalization gap in modern deep neural networks (DNNs). DNNs show remarkable generalization properties, yet the source of these capabilities remains elusive, defying the established statistical learning theory. Recent studies have revealed that properties of training trajectories can be indicative of generalization. Building on this insight, state-of-the-art methods have leveraged the topology of these trajectories, particularly their fractal dimension, to quantify generalization. Most existing works compute this quantity by assuming continuous- or infinite-time training dynamics, complicating the development of practical estimators capable of accurately predicting generalization without access to test data. In this paper, we respect the discrete-time nature of training trajectories and investigate the underlying topological quantities that can be amenable to topological data analysis tools. This leads to a new family of reliable topological complexity measures that provably bound the generalization error, eliminating the need for restrictive geometric assumptions. These measures are computationally friendly, enabling us to propose simple yet effective algorithms for computing generalization indices. Moreover, our flexible framework can be extended to different domains, tasks, and architectures. Our experimental results demonstrate that our new complexity measures correlate highly with generalization error in industry-standards architectures such as transformers and deep graph networks. Our approach consistently outperforms existing topological bounds across a wide range of datasets, models, and optimizers, highlighting the practical relevance and effectiveness of our complexity measures.
