A Fully Polynomial-Time Algorithm for Robustly Learning Halfspaces over the Hypercube
Gautam Chandrasekaran, Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan
TL;DR
This work delivers the first fully polynomial-time algorithm for agnostically learning halfspaces under the uniform distribution on the Boolean hypercube in the presence of contamination, achieving error on the order of ${ m opt}^{O(1)}+ε$. The authors blend a robust phase-based approach with Generalized Linear Models, leveraging a heavy-coefficient GLM component, hinge-loss learning for regular tail coefficients, and a robust identification of influential coordinates via Chow-parameter-inspired techniques. A key technical advance is a GLM-learning procedure with runtime polylogarithmic in the activation’s Lipschitz constant, enabling efficient learning of sigmoidal activations with weights bounded independently of dimension. The framework tolerates bounded contamination and, via a reduction to adaptive vs. oblivious adversaries, yields poly(d,1/ε)-time guarantees with error that scales as a function of the contamination rate. Overall, the paper shows that supervised learning with discrete distributions such as the hypercube can be robustly and efficiently learned, bridging a notable gap between discrete and continuous distribution settings and extending GLM tools to robust, high-dimensional, discrete learning problems.
Abstract
We give the first fully polynomial-time algorithm for learning halfspaces with respect to the uniform distribution on the hypercube in the presence of contamination, where an adversary may corrupt some fraction of examples and labels arbitrarily. We achieve an error guarantee of $η^{O(1)}+ε$ where $η$ is the noise rate. Such a result was not known even in the agnostic setting, where only labels can be adversarially corrupted. All prior work over the last two decades has a superpolynomial dependence in $1/ε$ or succeeds only with respect to continuous marginals (such as log-concave densities). Previous analyses rely heavily on various structural properties of continuous distributions such as anti-concentration. Our approach avoids these requirements and makes use of a new algorithm for learning Generalized Linear Models (GLMs) with only a polylogarithmic dependence on the activation function's Lipschitz constant. More generally, our framework shows that supervised learning with respect to discrete distributions is not as difficult as previously thought.
