Table of Contents
Fetching ...

Learning Against Distributional Uncertainty: On the Trade-off Between Robustness and Specificity

Shixiong Wang, Haowei Wang, Xinke Li, Jean Honorio

TL;DR

This work tackles distributional uncertainty in supervised learning by introducing Bayesian Distributionally Robust Learning (BDR), a unifying framework that interpolates between empirical risk minimization, Bayesian modeling, regularization, and distributionally robust optimization. The authors derive a general BDR objective with a tunable weight $\beta_n$ that recovers SAA, DRO, or regularized SAA as special cases, and they establish both asymptotic and non-asymptotic properties, including consistency, asymptotic normality, generalization bounds, and potential unbiasedness. They provide solution strategies under φ-divergence and Wasserstein ambiguity sets and present a practical SGD-like algorithm (BDR-GD) with guidance on hyper-parameter tuning. Empirical results on linear SVMs and deep networks demonstrate that BDR can outperform both DRO and SAA, particularly in settings with limited data, by reducing conservatism while preserving robustness. Overall, the framework offers a principled, data-driven way to trade off robustness to unseen distributions against fidelity to training data, with strong theoretical guarantees and practical scalability.

Abstract

Trustworthy machine learning aims at combating distributional uncertainties in training data distributions compared to population distributions. Typical treatment frameworks include the Bayesian approach, (min-max) distributionally robust optimization (DRO), and regularization. However, three issues have to be raised: 1) the prior distribution in the Bayesian method and the regularizer in the regularization method are difficult to specify; 2) the DRO method tends to be overly conservative; 3) all the three methods are biased estimators of the true optimal cost. This paper studies a new framework that unifies the three approaches and addresses the three challenges above. The asymptotic properties (e.g., consistencies and asymptotic normalities), non-asymptotic properties (e.g., generalization bounds and unbiasedness), and solution methods of the proposed model are studied. The new model reveals the trade-off between the robustness to the unseen data and the specificity to the training data. Experiments on various real-world tasks validate the superiority of the proposed learning framework.

Learning Against Distributional Uncertainty: On the Trade-off Between Robustness and Specificity

TL;DR

This work tackles distributional uncertainty in supervised learning by introducing Bayesian Distributionally Robust Learning (BDR), a unifying framework that interpolates between empirical risk minimization, Bayesian modeling, regularization, and distributionally robust optimization. The authors derive a general BDR objective with a tunable weight that recovers SAA, DRO, or regularized SAA as special cases, and they establish both asymptotic and non-asymptotic properties, including consistency, asymptotic normality, generalization bounds, and potential unbiasedness. They provide solution strategies under φ-divergence and Wasserstein ambiguity sets and present a practical SGD-like algorithm (BDR-GD) with guidance on hyper-parameter tuning. Empirical results on linear SVMs and deep networks demonstrate that BDR can outperform both DRO and SAA, particularly in settings with limited data, by reducing conservatism while preserving robustness. Overall, the framework offers a principled, data-driven way to trade off robustness to unseen distributions against fidelity to training data, with strong theoretical guarantees and practical scalability.

Abstract

Trustworthy machine learning aims at combating distributional uncertainties in training data distributions compared to population distributions. Typical treatment frameworks include the Bayesian approach, (min-max) distributionally robust optimization (DRO), and regularization. However, three issues have to be raised: 1) the prior distribution in the Bayesian method and the regularizer in the regularization method are difficult to specify; 2) the DRO method tends to be overly conservative; 3) all the three methods are biased estimators of the true optimal cost. This paper studies a new framework that unifies the three approaches and addresses the three challenges above. The asymptotic properties (e.g., consistencies and asymptotic normalities), non-asymptotic properties (e.g., generalization bounds and unbiasedness), and solution methods of the proposed model are studied. The new model reveals the trade-off between the robustness to the unseen data and the specificity to the training data. Experiments on various real-world tasks validate the superiority of the proposed learning framework.
Paper Structure (51 sections, 10 theorems, 92 equations, 13 figures, 7 tables)

This paper contains 51 sections, 10 theorems, 92 equations, 13 figures, 7 tables.

Key Result

Lemma 1

If $\bar{\mathbb{P}}$ is the mean distribution of $\mathbb{P}$ under $\mathbb{Q}$ and $\mathbb{E}_{\mathbb{Q}}\mathbb{E}_{\mathbb{P}} |h(\bm x, \mathbf{\xi})| < \infty$, then $\mathbb{E}_{\mathbb{Q}}\mathbb{E}_{\mathbb{P}}h(\bm x, \mathbf{\xi}) = \mathbb{E}_{\bar{\mathbb{P}}} h(\bm x, \mathbf{\xi})$

Figures (13)

  • Figure 1: Cost functions; the SAA cost cannot upper bound the true cost. (a): when $\beta = 0.50956$, the BDR cost function provides a tight upper bound for the true cost function; (b): when $\beta < 0.50956$, the BDR cost function cannot upper bound the true cost function; (c): when $\beta > 0.50956$, the BDR cost function provides a loose upper bound for the true cost function. If the feasible region of the decision variable $x$ is required to be $[-1.7, 1.7]$ rather than $\mathbb{R}$, the BDR bound in (a) is no longer tight but that in (b) becomes tight. (Source Codes: https://github.com/Spratm-Asleaf/Robustness-Specificity.)
  • Figure 2: Average test accuracy for 4 vs 9 over 100 trials. Averaged CPU times (seconds): BDR = 68, DRO = 66, and SAA = 7.
  • Figure 3: Error rate of models trained by partial training sets on MNIST test set. Various $\beta$ values are used during training: $\beta=0$ for SAA learning, $\beta=1$ for DRO learning, and $\beta^*$ indicating the best value among various $\beta$ for BDR learning.
  • Figure 4: Test set accuracy v.s.$\beta$ across various tasks. Upper Panel: PointNet on ModelNet40 with 10% (left) and 50% (right) training data. Lower Panel: WRN-18 on CIFAR-10 with 10% (left) and 50% (right) training data. The marker "$\circ$" stands for searching set of $\beta$: i.e., $\{0.01, 0.05, 0.1, 0.5\}$. (NB: $\beta=0$ for SAA learning, $\beta=1$ for DRO learning.)
  • Figure 5: Average out-of-sample accuracy on the MNIST dataset for 3 vs 8 over 100 independent trials.
  • ...and 8 more figures

Theorems & Definitions (34)

  • Definition 1: Mean Distribution
  • Lemma 1: wang2022robustness-gen-error
  • Remark 1: Interpretation of Model \ref{['eq:BDR-opt']}
  • Remark 2: Robustness-Specificity Trade-off
  • Theorem 1: Asymptotic Properties of \ref{['eq:bdr-method']}
  • proof
  • Remark 3: Practicability of Conditions
  • Theorem 2: Generalization Bound of \ref{['eq:bdr-method']}
  • proof
  • Remark 4
  • ...and 24 more