Understanding the Generalization Error of Markov algorithms through Poissonization
Benjamin Dupuis, Maxime Haddouche, George Deligiannidis, Umut Simsekli
TL;DR
This work presents a general Poissonization framework to analyze the generalization error of Markov learning algorithms, translating discrete-time updates into a continuous-time Poissonized process with a tractable entropy-flow. The authors derive a closed-form differential equation for the evolution of the PAC-Bayes KL divergence between the posterior and prior dynamics, decomposing it into an expansion term and a Bregman term linked to a modified log-Sobolev inequality (LSI). By connecting the Bregman term to modified LSIs, the paper provides time-uniform generalization bounds that apply to both noisy (e.g., SGLD) and non-noisy (e.g., SGD) algorithms, and it demonstrates how diffusive priors yield explicit constants improving the bounds. The framework unifies several strands of prior work, recovers known Poissonized results, and offers new bounds via first- and second-order expansions, with depoissonization discussed as a bridge back to discrete iterations. Overall, the Poissonization approach broadens the applicability of entropy-flow-based generalization analyses to a wider class of learning algorithms and noise structures, enabling sharper, time-uniform guarantees with principled priors.
Abstract
Using continuous-time stochastic differential equation (SDE) proxies to stochastic optimization algorithms has proven fruitful for understanding their generalization abilities. A significant part of these approaches are based on the so-called ``entropy flows'', which greatly simplify the generalization analysis. Unfortunately, such well-structured entropy flows cannot be obtained for most discrete-time algorithms, and the existing SDE approaches remain limited to specific noise and algorithmic structures. We aim to alleviate this issue by introducing a generic framework for analyzing the generalization error of Markov algorithms through `Poissonization', a continuous-time approximation of discrete-time processes with formal approximation guarantees. Through this approach, we first develop a novel entropy flow, which directly leads to PAC-Bayesian generalization bounds. We then draw novel links to modified versions of the celebrated logarithmic Sobolev inequalities (LSI), identify cases where such LSIs are satisfied, and obtain improved bounds. Beyond its generality, our framework allows exploiting specific properties of learning algorithms. In particular, we incorporate the noise structure of different algorithm types - namely, those with additional noise injections (noisy) and those without (non-noisy) - through various technical tools. This illustrates the capacity of our methods to achieve known (yet, Poissonized) and new generalization bounds.
