Table of Contents
Fetching ...

Generalization Bounds for Dependent Data using Online-to-Batch Conversion

Sagnik Chatterjee, Manuj Mukherjee, Alhad Sethi

TL;DR

This work extends the Online-to-Batch framework to non-iid data by training on data drawn from mixing processes and proving generalization bounds that match the iid rate up to a term dictated by the mixing decay. A new notion of Wasserstein-stability for online learners is introduced, and the Exponential-Weighted Average (EWA) online learner is shown to satisfy this stability, enabling the bounds to be instantiated for any batch learner without requiring stability of the batch algorithm itself. The results provide both expected and high-probability generalization bounds, with explicit instantiations under geometric $\phi$-mixing that involve the KL-divergence between the batch-output distribution and a reference prior. This framework broadens non-iid generalization analysis, offering practical implications for learning from dependent data in time series and related settings, and suggests avenues for integrating differential-privacy-inspired stability concepts into online-to-batch analyses.

Abstract

In this work, we upper bound the generalization error of batch learning algorithms trained on samples drawn from a mixing stochastic process (i.e., a dependent data source) both in expectation and with high probability. Unlike previous results by Mohri et al. (2010) and Fu et al. (2023), our work does not require any stability assumptions on the batch learner, which allows us to derive upper bounds for any batch learning algorithm trained on dependent data. This is made possible due to our use of the Online-to-Batch ( OTB ) conversion framework, which allows us to shift the burden of stability from the batch learner to an artificially constructed online learner. We show that our bounds are equal to the bounds in the i.i.d. setting up to a term that depends on the decay rate of the underlying mixing stochastic process. Central to our analysis is a new notion of algorithmic stability for online learning algorithms based on Wasserstein distances of order one. Furthermore, we prove that the EWA algorithm, a textbook family of online learning algorithms, satisfies our new notion of stability. Following this, we instantiate our bounds using the EWA algorithm.

Generalization Bounds for Dependent Data using Online-to-Batch Conversion

TL;DR

This work extends the Online-to-Batch framework to non-iid data by training on data drawn from mixing processes and proving generalization bounds that match the iid rate up to a term dictated by the mixing decay. A new notion of Wasserstein-stability for online learners is introduced, and the Exponential-Weighted Average (EWA) online learner is shown to satisfy this stability, enabling the bounds to be instantiated for any batch learner without requiring stability of the batch algorithm itself. The results provide both expected and high-probability generalization bounds, with explicit instantiations under geometric -mixing that involve the KL-divergence between the batch-output distribution and a reference prior. This framework broadens non-iid generalization analysis, offering practical implications for learning from dependent data in time series and related settings, and suggests avenues for integrating differential-privacy-inspired stability concepts into online-to-batch analyses.

Abstract

In this work, we upper bound the generalization error of batch learning algorithms trained on samples drawn from a mixing stochastic process (i.e., a dependent data source) both in expectation and with high probability. Unlike previous results by Mohri et al. (2010) and Fu et al. (2023), our work does not require any stability assumptions on the batch learner, which allows us to derive upper bounds for any batch learning algorithm trained on dependent data. This is made possible due to our use of the Online-to-Batch ( OTB ) conversion framework, which allows us to shift the burden of stability from the batch learner to an artificially constructed online learner. We show that our bounds are equal to the bounds in the i.i.d. setting up to a term that depends on the decay rate of the underlying mixing stochastic process. Central to our analysis is a new notion of algorithmic stability for online learning algorithms based on Wasserstein distances of order one. Furthermore, we prove that the EWA algorithm, a textbook family of online learning algorithms, satisfies our new notion of stability. Following this, we instantiate our bounds using the EWA algorithm.
Paper Structure (18 sections, 19 theorems, 49 equations)

This paper contains 18 sections, 19 theorems, 49 equations.

Key Result

Lemma 1

Every $P,Q$ on $(\Omega,\mathcal{F})$ satisfies $d_{\mathrm{TV}}\left(P,Q\right) \leq \sqrt{\frac{1}{2}{\mathrm{D}}_{\mathrm{KL}}\left(P\,\Vert\,Q\right)}.$

Theorems & Definitions (36)

  • Lemma 1: Pinsker's Inequality
  • Lemma 2
  • Lemma 3: Azuma-Hoeffding inequality
  • Definition 1: $\beta$ and $\phi$ coefficients
  • Definition 2: $\beta$ and $\phi$ mixing
  • Remark 1
  • Definition 3: Geometric $\phi$-mixing
  • Lemma 4
  • Corollary 4
  • proof
  • ...and 26 more