Table of Contents
Fetching ...

From Distributional Robustness to Robust Statistics: A Confidence Sets Perspective

Gabriel Chan, Bart Van Parys, Amine Bennouna

TL;DR

It is shown that a DRO ambiguity set, based on the Kullback-Leibler divergence and total variation distance, is uniformly minimal, meaning it represents the smallest confidence set that contains the unknown distribution with at a given confidence power.

Abstract

We establish a connection between distributionally robust optimization (DRO) and classical robust statistics. We demonstrate that this connection arises naturally in the context of estimation under data corruption, where the goal is to construct ``minimal'' confidence sets for the unknown data-generating distribution. Specifically, we show that a DRO ambiguity set, based on the Kullback-Leibler divergence and total variation distance, is uniformly minimal, meaning it represents the smallest confidence set that contains the unknown distribution with at a given confidence power. Moreover, we prove that when parametric assumptions are imposed on the unknown distribution, the ambiguity set is never larger than a confidence set based on the optimal estimator proposed by Huber. This insight reveals that the commonly observed conservatism of DRO formulations is not intrinsic to these formulations themselves but rather stems from the non-parametric framework in which these formulations are employed.

From Distributional Robustness to Robust Statistics: A Confidence Sets Perspective

TL;DR

It is shown that a DRO ambiguity set, based on the Kullback-Leibler divergence and total variation distance, is uniformly minimal, meaning it represents the smallest confidence set that contains the unknown distribution with at a given confidence power.

Abstract

We establish a connection between distributionally robust optimization (DRO) and classical robust statistics. We demonstrate that this connection arises naturally in the context of estimation under data corruption, where the goal is to construct ``minimal'' confidence sets for the unknown data-generating distribution. Specifically, we show that a DRO ambiguity set, based on the Kullback-Leibler divergence and total variation distance, is uniformly minimal, meaning it represents the smallest confidence set that contains the unknown distribution with at a given confidence power. Moreover, we prove that when parametric assumptions are imposed on the unknown distribution, the ambiguity set is never larger than a confidence set based on the optimal estimator proposed by Huber. This insight reveals that the commonly observed conservatism of DRO formulations is not intrinsic to these formulations themselves but rather stems from the non-parametric framework in which these formulations are employed.

Paper Structure

This paper contains 20 sections, 16 theorems, 81 equations, 5 figures.

Key Result

Theorem 2.7

For any regular confidence set estimator $S$ verifying the coverage guarantee eq:feasibility:parametric with $r\geq 0$ we have $S_{r,\alpha}(\hat{\mathbb P}) \subseteq S(\hat{\mathbb P})$ for all $\hat{\mathbb P} \in \mathcal{P}$.

Figures (5)

  • Figure 1: The statistical resolution function can be nonconvex resulting in nonconvex set estimators $S_{r,\alpha}(\hat{\mathbb P})$. For instance, with $\alpha = 0.3$, the set $S_{0.4,\alpha}(\hat{\mathbb P})$ ($= \{ \theta \; : \; r^\alpha(\hat{\mathbb{P}},\theta) \leq 0.4 \}$) indicated in green is the union of two disjoint intervals. The region in red denotes $S_{0,\alpha}(\hat{\mathbb P})$ identified as the roots of the statistical resolution function. The fact that the latter region is not a singleton indicates that learning the unknown parameter with corruption exactly is not possible, even with infinite data.
  • Figure 2: The statistical resolution function $r_{\mathrm{wc}}^\alpha$ is characterized as the minimum KL distance between a common distribution and the sets $\mathcal{P}^-_\Delta:= \{ \mathbb Q_\Delta^-\ : \ \mathop{\mathrm{TV}}\nolimits(\mathbb Q_\Delta^-, \mathbb P_{-\Delta})\leq \alpha \}$ and $\mathcal{P}^+_\Delta:= \{ \mathbb Q_\Delta^-\ : \ \mathop{\mathrm{TV}}\nolimits(\mathbb Q_\Delta^-, \mathbb P_{-\Delta})\leq \alpha \}$ shown in blue. The densely dotted sets represent all distributions at KL distance at most $r_{\mathrm{wc}}^\alpha(\Delta)$ from the sets $\mathcal{P}^-_\Delta$ and $\mathcal{P}^+_\Delta$.
  • Figure 3:
  • Figure 4: Phase diagram in the statistical resolution / corruption level $(r, \alpha)$ space for a standard normal location family. The green (respectively blue) region indicates the regime when the generalized mean (respectively median) is worst-case minimal. The dotted lines indicate phase transitions in terms of worst-case radius $\Delta_{\mathrm{wc}}(r,\alpha)$ and almost sure minimal radius $\Delta_{\mathrm{as}}(r,\alpha)$. In particular, the red and black dotted lines indicate the $(r,\alpha)$ phases with existing confidence intervals with non-trivial worst-case radius and almost sure radius respectively.
  • Figure 5: Illustration of the worst-case distributions in the three different almost sure efficiency regimes.

Theorems & Definitions (34)

  • Example 2.1: Non-Parametric
  • Example 2.2: Exponential Family
  • Example 2.3: Location Family
  • Example 2.4: Location-Scale Family
  • Definition 2.5: Confidence Set Estimators
  • Example 2.6: Normal Family
  • Theorem 2.7: KL-TV Uniform Minimality
  • Theorem 2.8: Coverage Guarantee
  • Theorem 2.9: Limit
  • Example 3.1: Location Families
  • ...and 24 more