Agnostic Active Learning of Single Index Models with Linear Sample Complexity

Aarshvi Gajjar, Wai Ming Tai, Xingyu Xu, Chinmay Hegde, Yi Li, Christopher Musco

TL;DR

This work addresses agnostic active learning for single index models of the form $h(oldsymbol{x})=f(oldsymbol{x}^ opoldsymbol{w})$ under adversarial noise. It introduces a leverage-score sampling framework that yields near-linear sample complexity in the ambient dimension, achieving $ ilde{O}(d)$ labeled samples when $f$ is known and Lipschitz, and extending to unknown Lipschitz $f$ with only a logarithmic factor in $n$. The key technical contributions are nonlinear subspace embeddings for Lipschitz nonlinearities, a distribution-aware discretization of Lip$_L$, and the use of Dudley’s integral together with dual Sudakov minoration to obtain tight, distribution-free concentration bounds. These results significantly improve prior bounds, provide robust and scalable guarantees for PDE surrogate modeling and related scientific ML tasks, and open directions for computation-focused analyses and multi-index generalizations.

Abstract

We study active learning methods for single index models of the form $F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$, where $f:\mathbb{R} \to \mathbb{R}$ and ${\mathbf x,\mathbf w} \in \mathbb{R}^d$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $f$ is known and Lipschitz, we show that $\tilde{O}(d)$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent ${O}(d^{2})$ bound of \cite{gajjar2023active}. Second, we show that $\tilde{O}(d)$ samples suffice even in the more difficult setting when $f$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions.

Agnostic Active Learning of Single Index Models with Linear Sample Complexity

TL;DR

This work addresses agnostic active learning for single index models of the form

under adversarial noise. It introduces a leverage-score sampling framework that yields near-linear sample complexity in the ambient dimension, achieving

labeled samples when

is known and Lipschitz, and extending to unknown Lipschitz

with only a logarithmic factor in

. The key technical contributions are nonlinear subspace embeddings for Lipschitz nonlinearities, a distribution-aware discretization of Lip

, and the use of Dudley’s integral together with dual Sudakov minoration to obtain tight, distribution-free concentration bounds. These results significantly improve prior bounds, provide robust and scalable guarantees for PDE surrogate modeling and related scientific ML tasks, and open directions for computation-focused analyses and multi-index generalizations.

Abstract

We study active learning methods for single index models of the form

, where

and

. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when

is known and Lipschitz, we show that

samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent

bound of \cite{gajjar2023active}. Second, we show that

samples suffice even in the more difficult setting when

is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions.

Paper Structure (53 sections, 26 theorems, 164 equations)

This paper contains 53 sections, 26 theorems, 164 equations.

Table of Contents

Introduction
Single Index Models
Comparison to Non-active Supervised Learning
Our Results
Leverage score sampling.
Technical contributions in Theorem \ref{['thm:main_known_f']}.
Technical contributions in Theorem \ref{['thm:unknown_f']}.
Additional Discussion of Related Work
Remark on computational efficiency.
Notation
Preliminaries
Subsampled regression
Sampling process.
Main Results
Proof of the non-linear concentrations
...and 38 more sections

Key Result

Theorem 1

Let $f$ be a fixed $L$-Lipschitz function, let $\bm{X}\in \mathbb{R}^{n\times d}$ be a data matrix, and let $\bm{w}^\star = \arg\min_{\bm{w}} \|f(\bm{X} {\bm{w}}) - \bm{y}\|_2^2$. There is an algorithm that, for any $\varepsilon \in (0,1)$, observes $\tilde{O}\left(d^2\cdot \frac{L^8}{\varepsilon^4} with high probability. Above, $f(\bm{X} {\bm{w}})$ denotes the entrywise application of $f$ to the

Theorems & Definitions (37)

Theorem 1: Theorem 1 from gajjar2023active
Theorem 2
Theorem 3
Definition 4: $\varepsilon$-accurate solution
Definition 5: Statistical leverage score
Lemma 6: Non-linear subspace embedding with fixed non-linearity
Lemma 7: Subspace embedding
proof
Lemma 8: Non-linear subspace embedding with unknown non-linearity
proof
...and 27 more