Distributional Robustness Bounds Generalization Errors

Shixiong Wang; Haowei Wang

Distributional Robustness Bounds Generalization Errors

Shixiong Wang, Haowei Wang

TL;DR

The paper formalizes distributional robustness and connects Bayesian methods, DRO, and regularization within a unified framework to bound generalization errors under distributional uncertainty. It demonstrates that Bayesian approaches are PAC-distributionally robust and that regularized ERM is equivalent to Bayesian methods under Dirichlet-process priors, with DRO offering a two-sided robustness perspective. The results show that generalization error can be bounded via distributional uncertainty and robustness measures, providing a cohesive explanation for why these methods often generalize well. The practical impact lies in a principled way to design models that control generalization gaps through robustness measures, with concrete implications for covariance estimation and linear regression.

Abstract

Bayesian methods, distributionally robust optimization methods, and regularization methods are three pillars of trustworthy machine learning combating distributional uncertainty, e.g., the uncertainty of an empirical distribution compared to the true underlying distribution. This paper investigates the connections among the three frameworks and, in particular, explores why these frameworks tend to have smaller generalization errors. Specifically, first, we suggest a quantitative definition for "distributional robustness", propose the concept of "robustness measure", and formalize several philosophical concepts in distributionally robust optimization. Second, we show that Bayesian methods are distributionally robust in the probably approximately correct (PAC) sense; in addition, by constructing a Dirichlet-process-like prior in Bayesian nonparametrics, it can be proven that any regularized empirical risk minimization method is equivalent to a Bayesian method. Third, we show that generalization errors of machine learning models can be characterized using the distributional uncertainty of the nominal distribution and the robustness measures of these machine learning models, which is a new perspective to bound generalization errors, and therefore, explain the reason why distributionally robust machine learning models, Bayesian models, and regularization models tend to have smaller generalization errors in a unified manner.

Distributional Robustness Bounds Generalization Errors

TL;DR

Abstract

Paper Structure (60 sections, 13 theorems, 161 equations, 1 figure, 1 table)

This paper contains 60 sections, 13 theorems, 161 equations, 1 figure, 1 table.

Introduction
Background
Problem Statement
Literature Review
Research Gaps and Motivations
Contributions
Notations, Preliminaries, and Organization
Main Results
Concept System of Distributional Robustness
Formalization of Distributionally Robust Optimization
Practical Implementations of Distributionally Robust Optimization
Min-Max Distributionally Robust Optimization
Robustness and Sensitivity
Bayesian Methods
Regularized Sample-Average Approximation
...and 45 more sections

Key Result

Theorem 1

Let $(\bm x_1, L_1)$ solve the surrogate distributionally robust counterpart eq:surrogate-dist-robust-opt-eps at $\bar{\mathbb{P}}$ for model eq:true-opt. Then $\bm x_1$ is distributionally robust (in the sense of Definitions def:sol-dist-robust and def:dist-robust-opt-eps) with robustness measure $

Figures (1)

Figure 1: A visualization of the main results. (A more detailed discussion is available in Appendix \ref{['append:summary-of-relations']}.) In a nutshell, the Bayesian method \ref{['eq:bayesian-method']}, the DRO method \ref{['eq:dro-method']}, and the regularization method \ref{['eq:regularization-method']} are distributionally robust. Since distributional robustness bounds generalization errors, the three methods can generalize well.

Theorems & Definitions (74)

Definition 1: Distributional Robustness of Solution
Example 1
Example 2
Definition 2: Distributionally Robust Optimization
Definition 3: One-Sided Distributionally Robust Optimization
Definition 4: Surrogate Distributionally Robust Optimization
Definition 5: One-Sided Surrogate Distributionally Robust Optimization
Theorem 1: Surrogate Distributionally Robust Optimization
proof
Definition 6: Min-Max Distributionally Robust Optimization
...and 64 more

Distributional Robustness Bounds Generalization Errors

TL;DR

Abstract

Distributional Robustness Bounds Generalization Errors

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (74)