Distributional Robustness Bounds Generalization Errors
Shixiong Wang, Haowei Wang
TL;DR
The paper formalizes distributional robustness and connects Bayesian methods, DRO, and regularization within a unified framework to bound generalization errors under distributional uncertainty. It demonstrates that Bayesian approaches are PAC-distributionally robust and that regularized ERM is equivalent to Bayesian methods under Dirichlet-process priors, with DRO offering a two-sided robustness perspective. The results show that generalization error can be bounded via distributional uncertainty and robustness measures, providing a cohesive explanation for why these methods often generalize well. The practical impact lies in a principled way to design models that control generalization gaps through robustness measures, with concrete implications for covariance estimation and linear regression.
Abstract
Bayesian methods, distributionally robust optimization methods, and regularization methods are three pillars of trustworthy machine learning combating distributional uncertainty, e.g., the uncertainty of an empirical distribution compared to the true underlying distribution. This paper investigates the connections among the three frameworks and, in particular, explores why these frameworks tend to have smaller generalization errors. Specifically, first, we suggest a quantitative definition for "distributional robustness", propose the concept of "robustness measure", and formalize several philosophical concepts in distributionally robust optimization. Second, we show that Bayesian methods are distributionally robust in the probably approximately correct (PAC) sense; in addition, by constructing a Dirichlet-process-like prior in Bayesian nonparametrics, it can be proven that any regularized empirical risk minimization method is equivalent to a Bayesian method. Third, we show that generalization errors of machine learning models can be characterized using the distributional uncertainty of the nominal distribution and the robustness measures of these machine learning models, which is a new perspective to bound generalization errors, and therefore, explain the reason why distributionally robust machine learning models, Bayesian models, and regularization models tend to have smaller generalization errors in a unified manner.
