Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

Gautam Chandrasekaran; Adam Klivans; Vasilis Kontonis; Raghu Meka; Konstantinos Stavropoulos

Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

Gautam Chandrasekaran, Adam Klivans, Vasilis Kontonis, Raghu Meka, Konstantinos Stavropoulos

TL;DR

This work introduces a sigma-smoothed agnostic learning framework that replaces worst-case optimality with robustness to small Gaussian perturbations, and uses it to study learning concepts with low intrinsic dimension and bounded Gaussian surface area. The authors develop a polynomial-approximation-based approach, leveraging the Ornstein–Uhlenbeck operator and a density-ratio technique to reduce the problem to low-degree polynomials and L1 regression, enabling efficient learning under subgaussian and bounded marginals, with dimension-reduction maneuvers for scalability. They establish substantial results, including sublinear-time Monte Carlo-like learning for intersections of k halfspaces with margin, and extend the framework to margin-based, smoothed-distribution, and anti-concentration settings, along with SQ lower bounds that delineate inherent hardness in some regimes. The work thus provides a unifying, beyond-worst-case theory that yields practical learnability guarantees under weaker distributional assumptions, and it opens several directions for improving runtimes and extending the tail conditions. Overall, the paper advances a rigorous, analyzable path toward efficiently learning low-dimensional, well-behaved concepts in realistic noisy settings.

Abstract

In traditional models of supervised learning, the goal of a learner -- given examples from an arbitrary joint distribution on $\mathbb{R}^d \times \{\pm 1\}$ -- is to output a hypothesis that is competitive (to within $ε$) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes, we introduce a smoothed-analysis framework that requires a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of $k$-halfspaces in time $k^{poly(\frac{\log k}{εγ}) }$ where $γ$ is the margin parameter. Before our work, the best-known runtime was exponential in $k$ (Arriaga and Vempala, 1999).

Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

TL;DR

Abstract

In traditional models of supervised learning, the goal of a learner -- given examples from an arbitrary joint distribution on

-- is to output a hypothesis that is competitive (to within

) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes, we introduce a smoothed-analysis framework that requires a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of

-halfspaces in time

where

is the margin parameter. Before our work, the best-known runtime was exponential in

(Arriaga and Vempala, 1999).

Paper Structure (42 sections, 50 theorems, 88 equations, 3 tables, 2 algorithms)

This paper contains 42 sections, 50 theorems, 88 equations, 3 tables, 2 algorithms.

Introduction
Our Smoothed Learning Model
Learning Concepts with Low Intrinsic Dimension
Our Results
Measure of Complexity: Gaussian Surface Area
Main Results: Smoothed Agnostic Learning under Concentration
Applications
Agnostic Learning with Margin
Agnostic Learning under Smoothed Distributions
Agnostic Learning under Anti-concentration
Technical Overview
Polynomial Approximation in the Low-Dimensional Space
Duality Between Input and Smoothing Parameter
From Approximating $f(\cdot)$ to Approximating Density Ratios
Dimension Reduction and Polynomial Regression
...and 27 more sections

Key Result

Theorem 1.4

Let $D$ be a distribution on $\mathbb{R}^{d}\times \{\pm 1\}$ with sub-gaussian $\mathbf{x}$-marginal. There exists an algorithm that learns the class $\mathcal{F}(k, \Gamma)$ in the $\sigma$-smoothed setting with $N = d^{\mathrm{poly}(\frac{k\Gamma}{\sigma\epsilon})}\log(\frac{1}{\delta})$ samples

Theorems & Definitions (102)

Definition 1.1: Smoothed Optimality
Definition 1.2: Low-Dimensional, Bounded Surface Area Concepts
Remark 1.3
Theorem 1.4: Sub-Gaussian -- Informal, see also \ref{['thm:smooth_learning_subgaussian-main']}
Theorem 1.5: Bounded -- Informal, see also \ref{['thm:random_proj_bounded_main']}
Corollary 1.6: Intersections of $k$-halfspaces with $\gamma$-margin
Corollary 1.7: Informal, see also \ref{['corollary: agnostic_smoothed_subexp']}
Corollary 1.8: Informal, see also \ref{['corollary: agnostic_subexp_functions_halfspaces']}
Proposition 3.1: Polynomial Approximation of Random Translations
Remark 3.2
...and 92 more

Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

TL;DR

Abstract

Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (102)