Table of Contents
Fetching ...

Generalization Error of $f$-Divergence Stabilized Algorithms via Duality

Francisco Daunas, Iñaki Esnaola, Samir M. Perlaza, Gholamali Aminian

TL;DR

The paper addresses generalization in learning algorithms stabilized by $f$-divergence regularization (ERM-$f$DR) and extends the framework to constrained optimization. It develops a dual formulation using the Legendre-Fenchel transform and the implicit-function theorem, yielding an explicit normalization function $N_{Q,\boldsymbol{z}}(\lambda)$ and zero duality gap with the primal problem. It then derives an exact characterization of the generalization error $\bar{\bar{\mathsf{G}}}$ in terms of the dual solution and the conjugate $f^*$, with simplifications in key cases such as Gibbs-type algorithms where $f(x)=x\log x$. Together, these results provide concrete tools to analyze and quantify generalization under $f$-divergence regularization and to identify when ERM-$f$DR solutions coincide with constrained-optimal solutions.

Abstract

The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is extended to constrained optimization problems, establishing conditions for equivalence between the solution and constraints. A dual formulation of ERM-$f$DR is introduced, providing a computationally efficient method to derive the normalization function of the ERM-$f$DR solution. This dual approach leverages the Legendre-Fenchel transform and the implicit function theorem, enabling explicit characterizations of the generalization error for general algorithms under mild conditions, and another for ERM-$f$DR solutions.

Generalization Error of $f$-Divergence Stabilized Algorithms via Duality

TL;DR

The paper addresses generalization in learning algorithms stabilized by -divergence regularization (ERM-DR) and extends the framework to constrained optimization. It develops a dual formulation using the Legendre-Fenchel transform and the implicit-function theorem, yielding an explicit normalization function and zero duality gap with the primal problem. It then derives an exact characterization of the generalization error in terms of the dual solution and the conjugate , with simplifications in key cases such as Gibbs-type algorithms where . Together, these results provide concrete tools to analyze and quantify generalization under -divergence regularization and to identify when ERM-DR solutions coincide with constrained-optimal solutions.

Abstract

The solution to empirical risk minimization with -divergence regularization (ERM-DR) is extended to constrained optimization problems, establishing conditions for equivalence between the solution and constraints. A dual formulation of ERM-DR is introduced, providing a computationally efficient method to derive the normalization function of the ERM-DR solution. This dual approach leverages the Legendre-Fenchel transform and the implicit function theorem, enabling explicit characterizations of the generalization error for general algorithms under mild conditions, and another for ERM-DR solutions.

Paper Structure

This paper contains 18 sections, 16 theorems, 15 equations.

Key Result

Theorem 1

Under Assumptions assum:a and assum:b, the solution to the optimization problem in EqOp_f_ERMRERNormal, denoted by $P^{(Q, \lambda )}_{{\bm{\Theta}}|\boldsymbol{Z}=\bm{z}}\in \bigtriangleup_{Q}({\mathcal{M}})$, is unique, and for all ${\bm{\theta}} \in \mathop{\mathrm{supp}}\nolimits Q$, Moreover, under Assumptions assum:a, assum:b, and assum:c, if $\lambda$ in EqOp_f_ERMRERNormal and $\eta$ in E

Theorems & Definitions (21)

  • Definition 1: $f$-divergence csiszar1967information
  • Definition 2: Separable Empirical Risk Function perlaza2024ERMRER
  • Theorem 1
  • Definition 3: Normalization Function
  • Definition 4: Legendre-Fenchel transform boyd2004convex
  • Theorem 2
  • Lemma 1
  • Theorem 3
  • Lemma 2
  • Lemma 3
  • ...and 11 more