Table of Contents
Fetching ...

Learning Constant-Depth Circuits in Malicious Noise Models

Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan

TL;DR

This paper achieves the best possible dependence on the noise rate and succeed in the harshest possible noise model (i.e., contamination or so-called"nasty noise") using a simple outlier-removal method combined with Braverman's theorem for fooling constant-depth circuits.

Abstract

The seminal work of Linial, Mansour, and Nisan gave a quasipolynomial-time algorithm for learning constant-depth circuits ($\mathsf{AC}^0$) with respect to the uniform distribution on the hypercube. Extending their algorithm to the setting of malicious noise, where both covariates and labels can be adversarially corrupted, has remained open. Here we achieve such a result, inspired by recent work on learning with distribution shift. Our running time essentially matches their algorithm, which is known to be optimal assuming various cryptographic primitives. Our proof uses a simple outlier-removal method combined with Braverman's theorem for fooling constant-depth circuits. We attain the best possible dependence on the noise rate and succeed in the harshest possible noise model (i.e., contamination or so-called "nasty noise").

Learning Constant-Depth Circuits in Malicious Noise Models

TL;DR

This paper achieves the best possible dependence on the noise rate and succeed in the harshest possible noise model (i.e., contamination or so-called"nasty noise") using a simple outlier-removal method combined with Braverman's theorem for fooling constant-depth circuits.

Abstract

The seminal work of Linial, Mansour, and Nisan gave a quasipolynomial-time algorithm for learning constant-depth circuits () with respect to the uniform distribution on the hypercube. Extending their algorithm to the setting of malicious noise, where both covariates and labels can be adversarially corrupted, has remained open. Here we achieve such a result, inspired by recent work on learning with distribution shift. Our running time essentially matches their algorithm, which is known to be optimal assuming various cryptographic primitives. Our proof uses a simple outlier-removal method combined with Braverman's theorem for fooling constant-depth circuits. We attain the best possible dependence on the noise rate and succeed in the harshest possible noise model (i.e., contamination or so-called "nasty noise").

Paper Structure

This paper contains 6 sections, 4 theorems, 10 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1.2

For any $s,\ell, d \in {\mathbb{N}}$, and $\epsilon,\delta\in(0,1)$, there is an algorithm that learns the class of $\mathsf{AC}^0$ circuits of size $s$ and depth $\ell$ and achieves error $2\eta + \epsilon$, with running time and sample complexity $d^{O(k)}\log(1/\delta)$, where $k = {(\log(s))^{O(

Figures (1)

  • Figure 1: The diagram shows the input set of points $S_{\mathrm{inp}}$ (red circle), the clean points $S_{\mathrm{cln}}$ (green circle), the output $S_{\mathrm{filt}}$ (black circle) of \ref{['algorithm:main']} and the sets $S_1$ (yellow region), $S_2$ (blue region), $S_3$ (pink region). The set $S_{\mathrm{inp}}$ consists of clean points, except from an $\eta$ fraction of adversarial points. $S_1$ contains the adversarial points that are filtered out by the outlier removal process and $S_2$ contains the adversarial points that were not removed and are kept in $S_{\mathrm{filt}}$. $S_3$ contains the clean points that were filtered out during outlier removal. \ref{['lemma:outlier-removal']} states that $|S_3| \le |S_1|$ w.h.p.

Theorems & Definitions (13)

  • Definition 1.1: Learning from Contaminated Samples
  • Theorem 1.2
  • Definition 1.3: Sandwiching polynomials
  • Lemma 3.1: Outlier removal
  • Claim
  • proof
  • Claim
  • proof
  • Claim
  • proof
  • ...and 3 more