Empirical Risk Minimization with Relative Entropy Regularization

Samir M. Perlaza; Gaetan Bisson; Iñaki Esnaola; Alain Jean-Marie; Stefano Rini

Empirical Risk Minimization with Relative Entropy Regularization

Samir M. Perlaza, Gaetan Bisson, Iñaki Esnaola, Alain Jean-Marie, Stefano Rini

TL;DR

This work generalizes empirical risk minimization with relative entropy regularization (ERM-RER) by allowing a $σ$-finite reference measure instead of restricting to probability measures. It proves that, when a solution exists, the ERM-RER optimizer is a unique Gibbs probability measure that is mutually absolutely continuous with the reference, and it derives PAC-like guarantees for the ERM problem using this Gibbs solution. The authors introduce the log-partition function and its derivatives to characterize the mean, variance, and higher cumulants of the empirical risk when sampling models from the ERM-RER solution, and they establish sub-Gaussianity under mild conditions. A novel notion of sensitivity connects the impact of deviations from the ERM-RER solution to the generalization error, which, in the general $σ$-finite setting, can be bounded by the sum of lautum and mutual information between models and data; this links generalization to information-theoretic quantities beyond the classic probabilistic-reference case. The paper also analyzes the roles of coherent and consistent reference measures, concentration phenomena, and $( frac{δ}{}, frac{ε}{})$-optimality, providing a comprehensive, unified framework that subsumes discrete and differential entropy regularization and information-risk minimization within ERM-RER.

Abstract

The empirical risk minimization (ERM) problem with relative entropy regularization (ERM-RER) is investigated under the assumption that the reference measure is a $σ$-finite measure, and not necessarily a probability measure. Under this assumption, which leads to a generalization of the ERM-RER problem allowing a larger degree of flexibility for incorporating prior knowledge, numerous relevant properties are stated. Among these properties, the solution to this problem, if it exists, is shown to be a unique probability measure, mutually absolutely continuous with the reference measure. Such a solution exhibits a probably-approximately-correct guarantee for the ERM problem independently of whether the latter possesses a solution. For a fixed dataset and under a specific condition, the empirical risk is shown to be a sub-Gaussian random variable when the models are sampled from the solution to the ERM-RER problem. The generalization capabilities of the solution to the ERM-RER problem (the Gibbs algorithm) are studied via the sensitivity of the expected empirical risk to deviations from such a solution towards alternative probability measures. Finally, an interesting connection between sensitivity, generalization error, and lautum information is established.

Empirical Risk Minimization with Relative Entropy Regularization

TL;DR

This work generalizes empirical risk minimization with relative entropy regularization (ERM-RER) by allowing a

-finite reference measure instead of restricting to probability measures. It proves that, when a solution exists, the ERM-RER optimizer is a unique Gibbs probability measure that is mutually absolutely continuous with the reference, and it derives PAC-like guarantees for the ERM problem using this Gibbs solution. The authors introduce the log-partition function and its derivatives to characterize the mean, variance, and higher cumulants of the empirical risk when sampling models from the ERM-RER solution, and they establish sub-Gaussianity under mild conditions. A novel notion of sensitivity connects the impact of deviations from the ERM-RER solution to the generalization error, which, in the general

-finite setting, can be bounded by the sum of lautum and mutual information between models and data; this links generalization to information-theoretic quantities beyond the classic probabilistic-reference case. The paper also analyzes the roles of coherent and consistent reference measures, concentration phenomena, and

-optimality, providing a comprehensive, unified framework that subsumes discrete and differential entropy regularization and information-risk minimization within ERM-RER.

Abstract

The empirical risk minimization (ERM) problem with relative entropy regularization (ERM-RER) is investigated under the assumption that the reference measure is a

-finite measure, and not necessarily a probability measure. Under this assumption, which leads to a generalization of the ERM-RER problem allowing a larger degree of flexibility for incorporating prior knowledge, numerous relevant properties are stated. Among these properties, the solution to this problem, if it exists, is shown to be a unique probability measure, mutually absolutely continuous with the reference measure. Such a solution exhibits a probably-approximately-correct guarantee for the ERM problem independently of whether the latter possesses a solution. For a fixed dataset and under a specific condition, the empirical risk is shown to be a sub-Gaussian random variable when the models are sampled from the solution to the ERM-RER problem. The generalization capabilities of the solution to the ERM-RER problem (the Gibbs algorithm) are studied via the sensitivity of the expected empirical risk to deviations from such a solution towards alternative probability measures. Finally, an interesting connection between sensitivity, generalization error, and lautum information is established.

Paper Structure (53 sections, 52 theorems, 85 equations, 3 figures)

This paper contains 53 sections, 52 theorems, 85 equations, 3 figures.

Introduction
Empirical Risk Minimization (ERM)
Notation and Main Assumptions
Relative Entropy Extended to $\sigma$-Finite Measures
ERM with Relative Entropy Regularization
Type-I and Type-II Relative Entropy Regularization
The Solution to the ERM-RER Problem
Examples
ERM with Discrete Entropy Regularization
ERM with Differential Entropy Regularization
Information-Risk Minimization
Bounds on the Radon-Nikodym Derivative
Asymptotes of the Radon-Nikodym Derivative
Reference Measures
Coherent and Consistent Reference Measures
...and 38 more sections

Key Result

Theorem 1

If $P$ and $Q$ are both probability measures on a general measurable space $\left( \Omega , \mathscr{F} \right)$, with $P$ absolutely continuous with respect to $Q$, then, with equality if and only if $P$ and $Q$ are identical.

Figures (3)

Figure 1: Mean $K^{(1)}_{Q, \boldsymbol{z}}\left( - \frac{1}{\lambda} \right)$, variance $K^{(2)}_{Q, \boldsymbol{z}}\left( - \frac{1}{\lambda} \right)$, and third central moment $K^{(3)}_{Q, \boldsymbol{z}}\left( - \frac{1}{\lambda} \right)$ of the empirical risk in Example \ref{['ExampleDecreasingVariance']}, with $Q\left( \mathcal{A} \right) = \frac{3}{4}$
Figure 2: Mean $K^{(1)}_{Q, \boldsymbol{z}}\left( - \frac{1}{\lambda} \right)$, variance $K^{(2)}_{Q, \boldsymbol{z}}\left( - \frac{1}{\lambda} \right)$, and third central moment $K^{(3)}_{Q, \boldsymbol{z}}\left( - \frac{1}{\lambda} \right)$ of the empirical risk in Example \ref{['ExampleDecreasingVariance']}, with $Q\left( \mathcal{A} \right) = \frac{1}{2}$
Figure 3: Mean $K^{(1)}_{Q, \boldsymbol{z}}\left( - \frac{1}{\lambda} \right)$, variance $K^{(2)}_{Q, \boldsymbol{z}}\left( - \frac{1}{\lambda} \right)$, and third central moment $K^{(3)}_{Q, \boldsymbol{z}}\left( - \frac{1}{\lambda} \right)$ of the empirical risk in Example \ref{['ExampleDecreasingVariance']}, with $Q\left( \mathcal{A} \right) = \frac{1}{4}$

Theorems & Definitions (62)

Definition 1: Generalized Relative Entropy
Theorem 1
Theorem 2
Definition 2: Expected Empirical Risk
Lemma 1
Theorem 3
Lemma 2
Lemma 3
Lemma 4
Corollary 1
...and 52 more

Empirical Risk Minimization with Relative Entropy Regularization

TL;DR

Abstract

Empirical Risk Minimization with Relative Entropy Regularization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (62)