Learning single-index models via harmonic decomposition
Nirmit Joshi, Hugo Koubbi, Theodor Misiakiewicz, Nathan Srebro
TL;DR
This work introduces spherical single-index models to study learning under general spherically symmetric inputs, arguing that rotational symmetry makes spherical harmonics the natural basis and Gegenbauer expansions the right tool for analysis. It establishes decoupled, per-harmonic-subspace lower and upper bounds, and provides two complementary estimators: spectral methods for low degrees ($\ell=1,2$) that are sample- or runtime-optimal, and harmonic tensor unfolding or online SGD for higher degrees ($\ell\ge3$) that achieve the corresponding optima. In the Gaussian setting, the theory recovers and clarifies prior results by showing that optimal learnability concentrates in the lowest-frequency subspaces, with the radial component of the input enabling runtime advantages. Overall, the symmetry-driven perspective unifies existing Gaussian SIM results, extends them to arbitrary spherical distributions, and highlights inherent trade-offs in achieving joint optimality across sample and computational resources.
Abstract
We study the problem of learning single-index models, where the label $y \in \mathbb{R}$ depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown one-dimensional projection $\langle \boldsymbol{w}_*,\boldsymbol{x}\rangle$. Prior work has shown that under Gaussian inputs, the statistical and computational complexity of recovering $\boldsymbol{w}_*$ is governed by the Hermite expansion of the link function. In this paper, we propose a new perspective: we argue that $spherical$ $harmonics$ -- rather than $Hermite$ $polynomials$ -- provide the natural basis for this problem, as they capture its intrinsic $rotational$ $symmetry$. Building on this insight, we characterize the complexity of learning single-index models under arbitrary spherically symmetric input distributions. We introduce two families of estimators -- based on tensor unfolding and online SGD -- that respectively achieve either optimal sample complexity or optimal runtime, and argue that estimators achieving both may not exist in general. When specialized to Gaussian inputs, our theory not only recovers and clarifies existing results but also reveals new phenomena that had previously been overlooked.
