Learning Against Distributional Uncertainty: On the Trade-off Between Robustness and Specificity
Shixiong Wang, Haowei Wang, Xinke Li, Jean Honorio
TL;DR
This work tackles distributional uncertainty in supervised learning by introducing Bayesian Distributionally Robust Learning (BDR), a unifying framework that interpolates between empirical risk minimization, Bayesian modeling, regularization, and distributionally robust optimization. The authors derive a general BDR objective with a tunable weight $\beta_n$ that recovers SAA, DRO, or regularized SAA as special cases, and they establish both asymptotic and non-asymptotic properties, including consistency, asymptotic normality, generalization bounds, and potential unbiasedness. They provide solution strategies under φ-divergence and Wasserstein ambiguity sets and present a practical SGD-like algorithm (BDR-GD) with guidance on hyper-parameter tuning. Empirical results on linear SVMs and deep networks demonstrate that BDR can outperform both DRO and SAA, particularly in settings with limited data, by reducing conservatism while preserving robustness. Overall, the framework offers a principled, data-driven way to trade off robustness to unseen distributions against fidelity to training data, with strong theoretical guarantees and practical scalability.
Abstract
Trustworthy machine learning aims at combating distributional uncertainties in training data distributions compared to population distributions. Typical treatment frameworks include the Bayesian approach, (min-max) distributionally robust optimization (DRO), and regularization. However, three issues have to be raised: 1) the prior distribution in the Bayesian method and the regularizer in the regularization method are difficult to specify; 2) the DRO method tends to be overly conservative; 3) all the three methods are biased estimators of the true optimal cost. This paper studies a new framework that unifies the three approaches and addresses the three challenges above. The asymptotic properties (e.g., consistencies and asymptotic normalities), non-asymptotic properties (e.g., generalization bounds and unbiasedness), and solution methods of the proposed model are studied. The new model reveals the trade-off between the robustness to the unseen data and the specificity to the training data. Experiments on various real-world tasks validate the superiority of the proposed learning framework.
