Table of Contents
Fetching ...

Heavy-tailed Contamination is Easier than Adversarial Contamination

Yeshwanth Cherapanamjeri, Daniel Lee

TL;DR

It is proved that any adversarially robust estimator is also resilient to heavy-tailed outliers for any statistical estimation problem with i.i.d data, implying that heavy-tailed estimation is likely easier than adversarially robust estimation opening the door to novel algorithmic approaches for the heavy-tailed setting.

Abstract

A large body of work in the statistics and computer science communities dating back to Huber (Huber, 1960) has led to statistically and computationally efficient outlier-robust estimators. Two particular outlier models have received significant attention: the adversarial and heavy-tailed models. While the former models outliers as the result of a malicious adversary manipulating the data, the latter relaxes distributional assumptions on the data allowing outliers to naturally occur as part of the data generating process. In the first setting, the goal is to develop estimators robust to the largest fraction of outliers while in the second, one seeks estimators to combat the loss of statistical efficiency, where the dependence on the failure probability is paramount. Despite these distinct motivations, the algorithmic approaches to both these settings have converged, prompting questions on the relationship between the models. In this paper, we investigate and provide a principled explanation for this phenomenon. First, we prove that any adversarially robust estimator is also resilient to heavy-tailed outliers for any statistical estimation problem with i.i.d data. As a corollary, optimal adversarially robust estimators for mean estimation, linear regression, and covariance estimation are also optimal heavy-tailed estimators. Conversely, for arguably the simplest high-dimensional estimation task of mean estimation, we construct heavy-tailed estimators whose application to the adversarial setting requires any black-box reduction to remove almost all the outliers in the data. Taken together, our results imply that heavy-tailed estimation is likely easier than adversarially robust estimation opening the door to novel algorithmic approaches for the heavy-tailed setting. Additionally, confidence intervals obtained for adversarially robust estimation also hold with high-probability.

Heavy-tailed Contamination is Easier than Adversarial Contamination

TL;DR

It is proved that any adversarially robust estimator is also resilient to heavy-tailed outliers for any statistical estimation problem with i.i.d data, implying that heavy-tailed estimation is likely easier than adversarially robust estimation opening the door to novel algorithmic approaches for the heavy-tailed setting.

Abstract

A large body of work in the statistics and computer science communities dating back to Huber (Huber, 1960) has led to statistically and computationally efficient outlier-robust estimators. Two particular outlier models have received significant attention: the adversarial and heavy-tailed models. While the former models outliers as the result of a malicious adversary manipulating the data, the latter relaxes distributional assumptions on the data allowing outliers to naturally occur as part of the data generating process. In the first setting, the goal is to develop estimators robust to the largest fraction of outliers while in the second, one seeks estimators to combat the loss of statistical efficiency, where the dependence on the failure probability is paramount. Despite these distinct motivations, the algorithmic approaches to both these settings have converged, prompting questions on the relationship between the models. In this paper, we investigate and provide a principled explanation for this phenomenon. First, we prove that any adversarially robust estimator is also resilient to heavy-tailed outliers for any statistical estimation problem with i.i.d data. As a corollary, optimal adversarially robust estimators for mean estimation, linear regression, and covariance estimation are also optimal heavy-tailed estimators. Conversely, for arguably the simplest high-dimensional estimation task of mean estimation, we construct heavy-tailed estimators whose application to the adversarial setting requires any black-box reduction to remove almost all the outliers in the data. Taken together, our results imply that heavy-tailed estimation is likely easier than adversarially robust estimation opening the door to novel algorithmic approaches for the heavy-tailed setting. Additionally, confidence intervals obtained for adversarially robust estimation also hold with high-probability.

Paper Structure

This paper contains 22 sections, 26 theorems, 122 equations, 1 figure, 2 algorithms.

Key Result

Theorem 1.3

For any constant $\varepsilon \in (0, 0.1)$, there exists an absolute constant $c > 0$ such that any adversarially-robust estimation algorithm with no internal randomness satisfying def:adv_rob_est, when given corruption parameter $\varepsilon$ and clean sample $\bm{X}$ satisfies:

Figures (1)

  • Figure 1: Illustration of the class of reductions \ref{['thm:sub_g_informal']} applies to. The input dataset $\bm{X}$ is processed by the reduction, $\mathcal{R}$ to produce another dataset $\bm{Y}$ that is ultimately used as input to the estimator $\mathcal{A}$ which produces the final output $\widehat{\theta}$.

Theorems & Definitions (44)

  • Definition 1.1: Strong Adversarial Contamination Model
  • Definition 1.2: Generic Adversarially Robust Estimator
  • Theorem 1.3: Informal
  • Definition 1.4: Optimal Adversarially Robust Mean Estimator
  • Definition 1.5
  • Corollary 1.6
  • Definition 1.7
  • Corollary 1.8
  • Corollary 1.9
  • Definition 1.10
  • ...and 34 more