Table of Contents
Fetching ...

Robust Learning with Optimal Error

Guy Blanc

Abstract

We construct algorithms with optimal error for learning with adversarial noise. The overarching theme of this work is that the use of \textsl{randomized} hypotheses can substantially improve upon the best error rates achievable with deterministic hypotheses. - For $η$-rate malicious noise, we show the optimal error is $\frac{1}{2} \cdot η/(1-η)$, improving on the optimal error of deterministic hypotheses by a factor of $1/2$. This answers an open question of Cesa-Bianchi et al. (JACM 1999) who showed randomness can improve error by a factor of $6/7$. - For $η$-rate nasty noise, we show the optimal error is $\frac{3}{2} \cdot η$ for distribution-independent learners and $η$ for fixed-distribution learners, both improving upon the optimal $2 η$ error of deterministic hypotheses. This closes a gap first noted by Bshouty et al. (Theoretical Computer Science 2002) when they introduced nasty noise and reiterated in the recent works of Klivans et al. (NeurIPS 2025) and Blanc et al. (SODA 2026). - For $η$-rate agnostic noise and the closely related nasty classification noise model, we show the optimal error is $η$, improving upon the optimal $2η$ error of deterministic hypotheses. All of our learners have sample complexity linear in the VC-dimension of the concept class and polynomial in the inverse excess error. All except for the fixed-distribution nasty noise learner are time efficient given access to an oracle for empirical risk minimization.

Robust Learning with Optimal Error

Abstract

We construct algorithms with optimal error for learning with adversarial noise. The overarching theme of this work is that the use of \textsl{randomized} hypotheses can substantially improve upon the best error rates achievable with deterministic hypotheses. - For -rate malicious noise, we show the optimal error is , improving on the optimal error of deterministic hypotheses by a factor of . This answers an open question of Cesa-Bianchi et al. (JACM 1999) who showed randomness can improve error by a factor of . - For -rate nasty noise, we show the optimal error is for distribution-independent learners and for fixed-distribution learners, both improving upon the optimal error of deterministic hypotheses. This closes a gap first noted by Bshouty et al. (Theoretical Computer Science 2002) when they introduced nasty noise and reiterated in the recent works of Klivans et al. (NeurIPS 2025) and Blanc et al. (SODA 2026). - For -rate agnostic noise and the closely related nasty classification noise model, we show the optimal error is , improving upon the optimal error of deterministic hypotheses. All of our learners have sample complexity linear in the VC-dimension of the concept class and polynomial in the inverse excess error. All except for the fixed-distribution nasty noise learner are time efficient given access to an oracle for empirical risk minimization.

Paper Structure

This paper contains 37 sections, 24 theorems, 186 equations, 6 figures.

Key Result

Theorem 1

For any concept class $\mathcal{C}$ with VC dimension $d$ and $\varepsilon > 0$, there is an algorithm that learns $\mathcal{C}$ with $\eta$-malicious noise and error at most $\frac{1}{2} \cdot \frac{\eta}{1-\eta} + \varepsilon$ using $O(\frac{d}{\varepsilon^2})$ samples. Furthermore, this algorithm

Figures (6)

  • Figure 1: Our malicious noise learner.
  • Figure 2: Our algorithm meeting the requirements of \ref{['thm:general-optimization']}
  • Figure 3: Algorithm 2 in J13.
  • Figure 4: The shaded region depicts the constraints of \ref{['eq:g-constraints-mal-body']}, which our choice of $g$ lies within.
  • Figure 5: Visualization of our construction used in the proof of \ref{['thm:nasty-lb-body']}.
  • ...and 1 more figures

Theorems & Definitions (86)

  • Theorem 1: Optimal learning with malicious noise, see \ref{['thm:malicious-body']} for the full version
  • Theorem 2: Optimal distribution-independent learning with nasty noise, see \ref{['thm:nasty-dist-free-upper-body', 'thm:nasty-lb-body']} for the full versions
  • Theorem 3: Optimal fixed-distribution learning with nasty noise, see \ref{['thm:nasty-fixed-dist-body']} for the full version
  • Theorem 4: Optimal learning with nasty classification noise, see \ref{['thm:agnostic-body']} for the formal version
  • Remark 1: Two definitions of error for agnostic noise
  • Claim 2.0: Improper learners are necessary
  • Claim 2.0: Distinct learners are necessary
  • Claim 3.1: Malicious error can be linearized
  • Claim 3.2: A good hypothesis exists for any distribution of concepts
  • Theorem 5: Special case of \ref{['thm:general-optimization']}
  • ...and 76 more