The Power of Iterative Filtering for Supervised Learning with (Heavy) Contamination
Adam R. Klivans, Konstantinos Stavropoulos, Kevin Tian, Arsen Vasilyan
TL;DR
The paper develops a general iterative polynomial filtering framework to enable efficient supervised learning from contaminated data under distributional assumptions. It shows that low-degree polynomial approximators suffice for BC-learning and that low-degree sandwiching polynomials suffice for HC-learning, yielding near-optimal error guarantees across several fundamental classes under hypercontractive marginals. The two-phase approach (filtering followed by L1/L2 polynomial regression) achieves robust performance and extends to tolerant testable learning for halfspaces under log-concave distributions. The work also establishes lower bounds, delineating a clear separation between bounded and heavy contamination and underscoring the necessity of distribution-specific structure. Collectively, these results significantly advance the tractability of learning under contamination and have implications for learning intersections of halfspaces, monotone functions, and convex sets in realistic, noisy data settings.
Abstract
Inspired by recent work on learning with distribution shift, we give a general outlier removal algorithm called iterative polynomial filtering and show a number of striking applications for supervised learning with contamination: (1) We show that any function class that can be approximated by low-degree polynomials with respect to a hypercontractive distribution can be efficiently learned under bounded contamination (also known as nasty noise). This is a surprising resolution to a longstanding gap between the complexity of agnostic learning and learning with contamination, as it was widely believed that low-degree approximators only implied tolerance to label noise. In particular, it implies the first efficient algorithm for learning halfspaces with $η$-bounded contamination up to error $2η+ε$ with respect to the Gaussian distribution. (2) For any function class that admits the (stronger) notion of sandwiching approximators, we obtain near-optimal learning guarantees even with respect to heavy additive contamination, where far more than $1/2$ of the training set may be added adversarially. Prior related work held only for regression and in a list-decodable setting. (3) We obtain the first efficient algorithms for tolerant testable learning of functions of halfspaces with respect to any fixed log-concave distribution. Even the non-tolerant case for a single halfspace in this setting had remained open. These results significantly advance our understanding of efficient supervised learning under contamination, a setting that has been much less studied than its unsupervised counterpart.
