Table of Contents
Fetching ...

Super Non-singular Decompositions of Polynomials and their Application to Robustly Learning Low-degree PTFs

Ilias Diakonikolas, Daniel M. Kane, Vasilis Kontonis, Sihan Liu, Nikos Zarifis

TL;DR

This work tackles robust PAC learning of degree-$d$ polynomial threshold functions on Gaussian data in the presence of a constant fraction of adversarial contamination. It introduces a localization framework that combines a robust margin-perceptron with a novel polynomial set partitioner based on super non-singular decompositions, enabling conditioning on low-margin regions while preserving strong (anti-)concentration. The authors prove an anti-concentration/concentration theory for Gaussian data under super non-singular polynomial transforms and show how to extend decompositions efficiently across online inputs. The resulting algorithm runs in time $n^{O(d)}\mathrm{poly}_{d,c}(1/\epsilon)$ and achieves error $O_{c,d}(\mathrm{opt}^{1-c})+\epsilon$, nearly matching the $d=1$ case in robustness for constant $d$. This framework, including the extendible SN decomposition and polynomial-set partitioning, offers a new toolkit for robustly learning low-degree PTFs beyond linear thresholds. The findings have potential impact on robust learning under structured noise and may influence broader robust-statistical learning problems involving high-degree polynomial classifiers.

Abstract

We study the efficient learnability of low-degree polynomial threshold functions (PTFs) in the presence of a constant fraction of adversarial corruptions. Our main algorithmic result is a polynomial-time PAC learning algorithm for this concept class in the strong contamination model under the Gaussian distribution with error guarantee $O_{d, c}(\text{opt}^{1-c})$, for any desired constant $c>0$, where $\text{opt}$ is the fraction of corruptions. In the strong contamination model, an omniscient adversary can arbitrarily corrupt an $\text{opt}$-fraction of the data points and their labels. This model generalizes the malicious noise model and the adversarial label noise model. Prior to our work, known polynomial-time algorithms in this corruption model (or even in the weaker adversarial label noise model) achieved error $\tilde{O}_d(\text{opt}^{1/(d+1)})$, which deteriorates significantly as a function of the degree $d$. Our algorithm employs an iterative approach inspired by localization techniques previously used in the context of learning linear threshold functions. Specifically, we use a robust perceptron algorithm to compute a good partial classifier and then iterate on the unclassified points. In order to achieve this, we need to take a set defined by a number of polynomial inequalities and partition it into several well-behaved subsets. To this end, we develop new polynomial decomposition techniques that may be of independent interest.

Super Non-singular Decompositions of Polynomials and their Application to Robustly Learning Low-degree PTFs

TL;DR

This work tackles robust PAC learning of degree- polynomial threshold functions on Gaussian data in the presence of a constant fraction of adversarial contamination. It introduces a localization framework that combines a robust margin-perceptron with a novel polynomial set partitioner based on super non-singular decompositions, enabling conditioning on low-margin regions while preserving strong (anti-)concentration. The authors prove an anti-concentration/concentration theory for Gaussian data under super non-singular polynomial transforms and show how to extend decompositions efficiently across online inputs. The resulting algorithm runs in time and achieves error , nearly matching the case in robustness for constant . This framework, including the extendible SN decomposition and polynomial-set partitioning, offers a new toolkit for robustly learning low-degree PTFs beyond linear thresholds. The findings have potential impact on robust learning under structured noise and may influence broader robust-statistical learning problems involving high-degree polynomial classifiers.

Abstract

We study the efficient learnability of low-degree polynomial threshold functions (PTFs) in the presence of a constant fraction of adversarial corruptions. Our main algorithmic result is a polynomial-time PAC learning algorithm for this concept class in the strong contamination model under the Gaussian distribution with error guarantee , for any desired constant , where is the fraction of corruptions. In the strong contamination model, an omniscient adversary can arbitrarily corrupt an -fraction of the data points and their labels. This model generalizes the malicious noise model and the adversarial label noise model. Prior to our work, known polynomial-time algorithms in this corruption model (or even in the weaker adversarial label noise model) achieved error , which deteriorates significantly as a function of the degree . Our algorithm employs an iterative approach inspired by localization techniques previously used in the context of learning linear threshold functions. Specifically, we use a robust perceptron algorithm to compute a good partial classifier and then iterate on the unclassified points. In order to achieve this, we need to take a set defined by a number of polynomial inequalities and partition it into several well-behaved subsets. To this end, we develop new polynomial decomposition techniques that may be of independent interest.
Paper Structure (47 sections, 46 theorems, 288 equations, 1 figure, 3 algorithms)

This paper contains 47 sections, 46 theorems, 288 equations, 1 figure, 3 algorithms.

Key Result

Theorem 1.2

There exists an algorithm that, given any $c, \epsilon \in (0,1)$, has sample and computational complexity $n^{O(d)}\mathrm{poly}_{d,c}(1/\epsilon)$, and learns the class of degree-$d$ PTFs on $\mathbb{R}^n$ in the nasty noise model under the Gaussian distribution within 0-1 error $O_{c,d}(1) \, \ma

Figures (1)

  • Figure 1: The localization region $|p(\mathbf{x}_1, \mathbf{x}_2)| = |\mathbf{x}_1^2 \mathbf{x}_2^2| \leq \epsilon$ is shown in blue. It is essentially a union of two rectangles (shown in the left figure) of width roughly $\sqrt{\epsilon}$. It is easy to see that (i) the total mass of the union is roughly $\sqrt{\epsilon})$; (ii) the expected value of $\mathbf{x}_1^2$ conditioned on the union is roughly $\Theta(1)$ (due to the contribution of the green rectangle). If the conditional distribution were a Gaussian, Carbery-Wright anti-concentration would imply that the conditional probability of $\left| \mathbf{x}_1^2 \right| < \epsilon$ should be at most $\mathrm{poly}(\epsilon)$. In sharp contrast, the mass of the set $\left| \mathbf{x}_1^2 \right| < \epsilon$ conditioned on the union is roughly $\Theta(1)$ (due to the contribution of the orange rectangle). To mitigate the issue, we will partition the low-margin set $|p(\mathbf{x}_1,\mathbf{x}_2)| \leq \epsilon$ into multiple rectangles as in the right figure. Since the Gaussian conditioned on each rectangle is a log-concave distribution, we have the desirable (anti-)concentration properties by CW:01.

Theorems & Definitions (123)

  • Definition 1.1: Nasty Noise or Strong Contamination Model
  • Theorem 1.2: Main Learning Result
  • Theorem 2.1: Informal -- Partitioning the Low-Margin Region of Polynomials
  • Definition 2.2: Super Non-Singular Polynomial Transformation (SNPT)
  • Theorem 2.3: Informal -- Conditional (anti-)concentration for SNPT, see \ref{['thm:(anti-)concentration']}
  • Definition 2.4: $(\delta, \kappa)$-Reasonable Gaussian
  • Definition 2.5: Distribution Comparability
  • Proposition 2.6: Informal -- Super Non-Singular Polynomial Transformations are Reasonable
  • Theorem 2.7: Informal -- Extendible Super Non-singular Decomposition
  • Definition 3.2: Univariate Hermite Polynomial
  • ...and 113 more