Table of Contents
Fetching ...

Agnostic Multi-Robust Learning Using ERM

Saba Ahmadi, Avrim Blum, Omar Montasser, Kevin Stangl

TL;DR

This paper tackles the problem of agnostic (non-realizable) robust learning under patch-like adversarial perturbations. It shows that naive ERM on augmented data can fail when zero robust error is impossible, and provides a Feige et al.-style reduction that yields robust guarantees using only an ERM oracle, with a bound on the expected robust loss of the majority vote of T predictors. The authors extend the framework to a multi-group setting, introducing a two-layer boosting method that achieves low robust loss across multiple disjoint groups, with randomized and deterministic (majority) guarantees. They establish generalization bounds based on the VC dimension of the robust-loss class and the group class, yielding sample complexities that scale with those complexities and logarithmic factors in the perturbation budget k. Overall, the work provides a theoretical pathway to obtain robust performance via ERM-based training combined with boosting, including a novel multi-robustness objective that distributes robust performance across diverse subgroups without requiring test-time group membership.

Abstract

A fundamental problem in robust learning is asymmetry: a learner needs to correctly classify every one of exponentially-many perturbations that an adversary might make to a test-time natural example. In contrast, the attacker only needs to find one successful perturbation. Xiang et al.[2022] proposed an algorithm that in the context of patch attacks for image classification, reduces the effective number of perturbations from an exponential to a polynomial number of perturbations and learns using an ERM oracle. However, to achieve its guarantee, their algorithm requires the natural examples to be robustly realizable. This prompts the natural question; can we extend their approach to the non-robustly-realizable case where there is no classifier with zero robust error? Our first contribution is to answer this question affirmatively by reducing this problem to a setting in which an algorithm proposed by Feige et al.[2015] can be applied, and in the process extend their guarantees. Next, we extend our results to a multi-group setting and introduce a novel agnostic multi-robust learning problem where the goal is to learn a predictor that achieves low robust loss on a (potentially) rich collection of subgroups.

Agnostic Multi-Robust Learning Using ERM

TL;DR

This paper tackles the problem of agnostic (non-realizable) robust learning under patch-like adversarial perturbations. It shows that naive ERM on augmented data can fail when zero robust error is impossible, and provides a Feige et al.-style reduction that yields robust guarantees using only an ERM oracle, with a bound on the expected robust loss of the majority vote of T predictors. The authors extend the framework to a multi-group setting, introducing a two-layer boosting method that achieves low robust loss across multiple disjoint groups, with randomized and deterministic (majority) guarantees. They establish generalization bounds based on the VC dimension of the robust-loss class and the group class, yielding sample complexities that scale with those complexities and logarithmic factors in the perturbation budget k. Overall, the work provides a theoretical pathway to obtain robust performance via ERM-based training combined with boosting, including a novel multi-robustness objective that distributes robust performance across diverse subgroups without requiring test-time group membership.

Abstract

A fundamental problem in robust learning is asymmetry: a learner needs to correctly classify every one of exponentially-many perturbations that an adversary might make to a test-time natural example. In contrast, the attacker only needs to find one successful perturbation. Xiang et al.[2022] proposed an algorithm that in the context of patch attacks for image classification, reduces the effective number of perturbations from an exponential to a polynomial number of perturbations and learns using an ERM oracle. However, to achieve its guarantee, their algorithm requires the natural examples to be robustly realizable. This prompts the natural question; can we extend their approach to the non-robustly-realizable case where there is no classifier with zero robust error? Our first contribution is to answer this question affirmatively by reducing this problem to a setting in which an algorithm proposed by Feige et al.[2015] can be applied, and in the process extend their guarantees. Next, we extend our results to a multi-group setting and introduce a novel agnostic multi-robust learning problem where the goal is to learn a predictor that achieves low robust loss on a (potentially) rich collection of subgroups.
Paper Structure (28 sections, 13 theorems, 63 equations, 1 figure, 1 algorithm)

This paper contains 28 sections, 13 theorems, 63 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1

Set $T(\varepsilon) = \frac{32 \ln k}{\varepsilon^2}$ and $m(\varepsilon, \delta) = O\left ( \frac{{\rm vc}(\mathcal{H})(\ln k)^2}{\varepsilon^4}\ln \left ( \frac{\ln k}{\varepsilon^2} \right )+\frac{\ln(1/\delta)}{\varepsilon^2} \right )$. Then, for any distribution $\mathcal{D}$ over $\mathcal{X}\ where ${\rm MAJ}(h_1,\dots, h_{T(\varepsilon)})$ shows the majority-vote of predictors $h_1,\dots,

Figures (1)

  • Figure 1: $\textsf{ERM}\xspace$ failure mode in the robustly un-realizable case. Blue, red, and black points show respectively original examples with a positive label, original examples with a negative label, and perturbations of original examples.

Theorems & Definitions (34)

  • Example 1
  • Theorem 1
  • Remark 1
  • Lemma 2
  • Lemma 3: *DBLP:conf/colt/FeigeMS15
  • Lemma 4: VC Dimension for the Robust Loss attias2022improved
  • Definition 1: Multi-Robustness
  • Definition 2: $\beta$-Multi-Robustness
  • Definition 3: Multi-Robustness on Average
  • Remark 2
  • ...and 24 more