Robust Learning of Multi-index Models via Iterative Subspace Approximation
Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, Nikos Zarifis
TL;DR
This work develops a robust, iterative subspace approximation framework for learning Multi-Index Models (MIMs) under Gaussian inputs with label noise. By leveraging low-degree conditional moments on regions of the identified subspace and iteratively augmenting the subspace with directions of large empirical moments, the authors obtain a general agnostic learner for well-behaved MIMs, with complexity scaling polynomially in the ambient dimension when the intrinsic dimension m is constant. They instantiate this framework for two central concept classes: Multiclass Linear Classifiers and Intersections of Halfspaces, achieving near-optimal error with fixed-polynomial runtime in the dimension, and providing RCN-tolerant variants with improved efficiency. They also establish SQ lower bounds proving the qualitative optimality of their approach for non-well-behaved MIMs and for MLC under random noise, highlighting fundamental algorithmic limits in this distributional setting. Overall, the paper delivers a principled, versatile method for efficient robust learning of structured, low-dimensional dependencies with broad implications for noise-robust classification tasks and related MIM-based models.
Abstract
We study the task of learning Multi-Index Models (MIMs) with label noise under the Gaussian distribution. A $K$-MIM is any function $f$ that only depends on a $K$-dimensional subspace. We focus on well-behaved MIMs with finite ranges that satisfy certain regularity properties. Our main contribution is a general robust learner that is qualitatively optimal in the Statistical Query (SQ) model. Our algorithm iteratively constructs better approximations to the defining subspace by computing low-degree moments conditional on the projection to the subspace computed thus far, and adding directions with relatively large empirical moments. This procedure efficiently finds a subspace $V$ so that $f(\mathbf{x})$ is close to a function of the projection of $\mathbf{x}$ onto $V$. Conversely, for functions for which these conditional moments do not help, we prove an SQ lower bound suggesting that no efficient learner exists. As applications, we provide faster robust learners for the following concept classes: * {\bf Multiclass Linear Classifiers} We give a constant-factor approximate agnostic learner with sample complexity $N = O(d) 2^{\mathrm{poly}(K/ε)}$ and computational complexity $\mathrm{poly}(N ,d)$. This is the first constant-factor agnostic learner for this class whose complexity is a fixed-degree polynomial in $d$. * {\bf Intersections of Halfspaces} We give an approximate agnostic learner for this class achieving 0-1 error $K \tilde{O}(\mathrm{OPT}) + ε$ with sample complexity $N=O(d^2) 2^{\mathrm{poly}(K/ε)}$ and computational complexity $\mathrm{poly}(N ,d)$. This is the first agnostic learner for this class with near-linear error dependence and complexity a fixed-degree polynomial in $d$. Furthermore, we show that in the presence of random classification noise, the complexity of our algorithm scales polynomially with $1/ε$.
