Attribute-Efficient PAC Learning of Low-Degree Polynomial Threshold Functions with Nasty Noise
Shiwei Zeng, Jie Shen
TL;DR
We address the problem of PAC learning $K$-sparse degree-$d$ polynomial threshold functions on $\mathbb{R}^n$ under Gaussian marginals in the presence of nasty noise. The authors leverage a structural sparsity result in the Hermite basis, showing that the Chow vector $\chi_{f^*}$ is $k$-sparse, and develop an attribute-efficient robust Chow-vector estimator (SparseFilter) that uses a restricted Frobenius-norm criterion to certify a good approximation or to filter corrupted samples. The main contribution is an algorithm running in time $(nd/\epsilon)^{O(d)}$ with sample complexity $O\big( K^{4d}(d\log n)^{5d} / \epsilon^{2d+2} \big)$, tolerating up to $\eta \le O(\epsilon^{d+1}/d^{2d})$ fraction of nasty-noise, and enabling PAC learning of $\mathcal{H}_{d,K}$ under Gaussian marginals with dimension-independent noise tolerance. This work significantly generalizes robustness results from sparse homogeneous halfspaces to general sparse low-degree PTFs, providing a practical and theoretically principled pathway for robust, attribute-efficient learning in high dimensions.
Abstract
The concept class of low-degree polynomial threshold functions (PTFs) plays a fundamental role in machine learning. In this paper, we study PAC learning of $K$-sparse degree-$d$ PTFs on $\mathbb{R}^n$, where any such concept depends only on $K$ out of $n$ attributes of the input. Our main contribution is a new algorithm that runs in time $({nd}/ε)^{O(d)}$ and under the Gaussian marginal distribution, PAC learns the class up to error rate $ε$ with $O(\frac{K^{4d}}{ε^{2d}} \cdot \log^{5d} n)$ samples even when an $η\leq O(ε^d)$ fraction of them are corrupted by the nasty noise of Bshouty et al. (2002), possibly the strongest corruption model. Prior to this work, attribute-efficient robust algorithms are established only for the special case of sparse homogeneous halfspaces. Our key ingredients are: 1) a structural result that translates the attribute sparsity to a sparsity pattern of the Chow vector under the basis of Hermite polynomials, and 2) a novel attribute-efficient robust Chow vector estimation algorithm which uses exclusively a restricted Frobenius norm to either certify a good approximation or to validate a sparsity-induced degree-$2d$ polynomial as a filter to detect corrupted samples.
