Table of Contents
Fetching ...

Is nasty noise actually harder than malicious noise?

Guy Blanc, Yizhi Huang, Tal Malkin, Rocco A. Servedio

TL;DR

This work analyzes the relative difficulty of learning Boolean functions under two adversarial noise models, malicious and nasty, across distribution-free and fixed-distribution PAC settings. It proves that in distribution-free learning nasty noise is no harder than malicious noise, via a sequence of reductions that connect malicious noise to TV-contamination and then to nasty noise, aided by amplification techniques. In the fixed-distribution setting, the paper exhibits arbitrarily large separations between the tolerable noise rates under standard cryptographic assumptions, and introduces ICE algorithms that recover a near-optimal, constant-factor relationship (up to a factor of two) between the two noise models. The results rely on a blend of error-correcting codes, PRFs, seeded extractors, and careful probabilistic couplings, yielding both upper bounds and matching lower bounds that sharpen the landscape of robust learning under adversarial data corruption. These insights illuminate when nasty noise matches malicious noise in power and when it can be provably harder, with implications for designing noise-tolerant learning systems and for understanding fundamental limits under cryptographic hardness assumptions.

Abstract

We consider the relative abilities and limitations of computationally efficient algorithms for learning in the presence of noise, under two well-studied and challenging adversarial noise models for learning Boolean functions: malicious noise, in which an adversary can arbitrarily corrupt a random subset of examples given to the learner; and nasty noise, in which an adversary can arbitrarily corrupt an adversarially chosen subset of examples given to the learner. We consider both the distribution-independent and fixed-distribution settings. Our main results highlight a dramatic difference between these two settings: For distribution-independent learning, we prove a strong equivalence between the two noise models: If a class ${\cal C}$ of functions is efficiently learnable in the presence of $η$-rate malicious noise, then it is also efficiently learnable in the presence of $η$-rate nasty noise. In sharp contrast, for the fixed-distribution setting we show an arbitrarily large separation: Under a standard cryptographic assumption, for any arbitrarily large value $r$ there exists a concept class for which there is a ratio of $r$ between the rate $η_{malicious}$ of malicious noise that polynomial-time learning algorithms can tolerate, versus the rate $η_{nasty}$ of nasty noise that such learning algorithms can tolerate. To offset the negative result for the fixed-distribution setting, we define a broad and natural class of algorithms, namely those that ignore contradictory examples (ICE). We show that for these algorithms, malicious noise and nasty noise are equivalent up to a factor of two in the noise rate: Any efficient ICE learner that succeeds with $η$-rate malicious noise can be converted to an efficient learner that succeeds with $η/2$-rate nasty noise. We further show that the above factor of two is necessary, again under a standard cryptographic assumption.

Is nasty noise actually harder than malicious noise?

TL;DR

This work analyzes the relative difficulty of learning Boolean functions under two adversarial noise models, malicious and nasty, across distribution-free and fixed-distribution PAC settings. It proves that in distribution-free learning nasty noise is no harder than malicious noise, via a sequence of reductions that connect malicious noise to TV-contamination and then to nasty noise, aided by amplification techniques. In the fixed-distribution setting, the paper exhibits arbitrarily large separations between the tolerable noise rates under standard cryptographic assumptions, and introduces ICE algorithms that recover a near-optimal, constant-factor relationship (up to a factor of two) between the two noise models. The results rely on a blend of error-correcting codes, PRFs, seeded extractors, and careful probabilistic couplings, yielding both upper bounds and matching lower bounds that sharpen the landscape of robust learning under adversarial data corruption. These insights illuminate when nasty noise matches malicious noise in power and when it can be provably harder, with implications for designing noise-tolerant learning systems and for understanding fundamental limits under cryptographic hardness assumptions.

Abstract

We consider the relative abilities and limitations of computationally efficient algorithms for learning in the presence of noise, under two well-studied and challenging adversarial noise models for learning Boolean functions: malicious noise, in which an adversary can arbitrarily corrupt a random subset of examples given to the learner; and nasty noise, in which an adversary can arbitrarily corrupt an adversarially chosen subset of examples given to the learner. We consider both the distribution-independent and fixed-distribution settings. Our main results highlight a dramatic difference between these two settings: For distribution-independent learning, we prove a strong equivalence between the two noise models: If a class of functions is efficiently learnable in the presence of -rate malicious noise, then it is also efficiently learnable in the presence of -rate nasty noise. In sharp contrast, for the fixed-distribution setting we show an arbitrarily large separation: Under a standard cryptographic assumption, for any arbitrarily large value there exists a concept class for which there is a ratio of between the rate of malicious noise that polynomial-time learning algorithms can tolerate, versus the rate of nasty noise that such learning algorithms can tolerate. To offset the negative result for the fixed-distribution setting, we define a broad and natural class of algorithms, namely those that ignore contradictory examples (ICE). We show that for these algorithms, malicious noise and nasty noise are equivalent up to a factor of two in the noise rate: Any efficient ICE learner that succeeds with -rate malicious noise can be converted to an efficient learner that succeeds with -rate nasty noise. We further show that the above factor of two is necessary, again under a standard cryptographic assumption.

Paper Structure

This paper contains 61 sections, 42 theorems, 95 equations, 11 figures.

Key Result

Theorem 1

Suppose that a class ${\cal C}$ of functions over $\{-1,+1\}^d$ is learnable to accuracy $\varepsilon$ and confidence $\delta$ in $\mathrm{poly}(d,1/\varepsilon,\log(1/\delta))$ time in the presence of $\eta$-rate malicious noise. Then ${\cal C}$ is learnable to accuracy $1.01\varepsilon$ and confid

Figures (11)

  • Figure 1: The structure of the proof of \ref{['thm:dist-free-combined']} (formal version of \ref{['thm:main-distribution-free-informal']}).
  • Figure 2: An algorithm for amplifying the success probability of a learner.
  • Figure 3: Division of the domain into a key side and value side
  • Figure 4: How the key side $X_{\mathrm{key}}$ is labeled by a concept $c_{p, q}$.
  • Figure 5: How the value side $X_{\mathrm{value}}$ is labeled by a concept $c_{p, q}$.
  • ...and 6 more figures

Theorems & Definitions (86)

  • Remark 1.1: On computational efficiency
  • Theorem 1: Nasty noise is no harder than malicious noise for efficient distribution-independent learning, informal version of \ref{['thm:dist-free-combined']}
  • Theorem 2: Amplifying the success probability with nasty noise, informal version of \ref{['thm:boost-no-holdout']}
  • Theorem 3: Separation between malicious and nasty noise, informal version of \ref{['thm:fixed-separation']}
  • Definition 1.2: Ignore contradictory examples (ICE) algorithms, informal version of \ref{['def:ICE-algorithm']}
  • Theorem 4: ICE-malicious learners imply nasty noise learners, informal version of \ref{['thm:ICE-malicious-nasty']}
  • Theorem 5: Tightness of \ref{['thm:ICE-malicious-nasty-informal']}, informal version of \ref{['thm:ICE-bad-news']}
  • Theorem 5: Nasty noise is no harder than malicious noise for efficient distribution-independent learning, informal version of \ref{['thm:dist-free-combined']}
  • Definition 2.1: TV noise
  • Theorem 5: Amplifying the success probability with nasty noise, informal version of \ref{['thm:boost-no-holdout']}
  • ...and 76 more