Efficient PAC Learning of Halfspaces with Constant Malicious Noise Rate
Jie Shen
TL;DR
This work tackles PAC learning of homogeneous halfspaces in the malicious-noise model, showing that a constant noise tolerance is achievable in polynomial time under concurrent large-margin and log-concave mixture assumptions. The method combines soft outlier removal via a linear-programming weight assignment with a reweighted hinge-loss minimization under a margin constraint, leveraging dense pancake properties to control gradient contributions. The results improve upon prior efficient algorithms by reaching $\eta = \Omega(1)$ under stated conditions, with rigorous deterministic and probabilistic analyses and explicit sample complexity. The approach offers a robust framework for adversarially corrupted data and suggests avenues for extensions to sparse models and potential linear-time algorithms, with broad implications for practical robust learning.
Abstract
Understanding noise tolerance of machine learning algorithms is a central quest in learning theory. In this work, we study the problem of computationally efficient PAC learning of halfspaces in the presence of malicious noise, where an adversary can corrupt both instances and labels of training samples. The best-known noise tolerance either depends on a target error rate under distributional assumptions or on a margin parameter under large-margin conditions. In this work, we show that when both types of conditions are satisfied, it is possible to achieve constant noise tolerance by minimizing a reweighted hinge loss. Our key ingredients include: 1) an efficient algorithm that finds weights to control the gradient deterioration from corrupted samples, and 2) a new analysis on the robustness of the hinge loss equipped with such weights.
