Near-optimal Active Regression of Single-Index Models

Yi Li; Wai Ming Tai

Near-optimal Active Regression of Single-Index Models

Yi Li, Wai Ming Tai

TL;DR

This work studies active regression for single-index models with a Lipschitz activation $f$, where $A$ is fully accessible but the labels $b$ are accessible only via entry queries. The authors design a sample-then-solve strategy augmented with a regularized objective $\min_x \|S(f(Ax)-b)\|_p^p + \varepsilon\|Ax\|_p^p$, use domain shrinking, and deploy two-stage sampling to achieve a $(1+\varepsilon)$-approximation with query complexity $\tilde{O}(d^{\frac{p}{2}\vee 1}/\varepsilon^{p\vee 2})$, tight up to polylog factors for $1\le p\le 2$ and showing optimal $1/\varepsilon^p$ dependence for $p>2$. The analysis combines Lewis-weight-based subspace embeddings, Dudley’s integral, and careful concentration arguments to bound the sampling error uniformly over a controllable domain. A lower-bound framework based on Yao’s minimax theorem establishes near-tight limits, while a technique to remove dependence on the ambient dimension $n$ yields query-efficient results that scale primarily with the dimension $d$ and accuracy $\varepsilon$. Overall, the paper advances the state of the art in near-optimal query-efficient single-index regression under nonconvex, Lipschitz nonlinearities with provable guarantees.

Abstract

The active regression problem of the single-index model is to solve $\min_x \lVert f(Ax)-b\rVert_p$, where $A$ is fully accessible and $b$ can only be accessed via entry queries, with the goal of minimizing the number of queries to the entries of $b$. When $f$ is Lipschitz, previous results only obtain constant-factor approximations. This work presents the first algorithm that provides a $(1+\varepsilon)$-approximation solution by querying $\tilde{O}(d^{\frac{p}{2}\vee 1}/\varepsilon^{p\vee 2})$ entries of $b$. This query complexity is also shown to be optimal up to logarithmic factors for $p\in [1,2]$ and the $\varepsilon$-dependence of $1/\varepsilon^p$ is shown to be optimal for $p>2$.

Near-optimal Active Regression of Single-Index Models

TL;DR

This work studies active regression for single-index models with a Lipschitz activation

, where

is fully accessible but the labels

are accessible only via entry queries. The authors design a sample-then-solve strategy augmented with a regularized objective

, use domain shrinking, and deploy two-stage sampling to achieve a

-approximation with query complexity

, tight up to polylog factors for

and showing optimal

dependence for

. The analysis combines Lewis-weight-based subspace embeddings, Dudley’s integral, and careful concentration arguments to bound the sampling error uniformly over a controllable domain. A lower-bound framework based on Yao’s minimax theorem establishes near-tight limits, while a technique to remove dependence on the ambient dimension

yields query-efficient results that scale primarily with the dimension

and accuracy

. Overall, the paper advances the state of the art in near-optimal query-efficient single-index regression under nonconvex, Lipschitz nonlinearities with provable guarantees.

Abstract

The active regression problem of the single-index model is to solve

, where

is fully accessible and

can only be accessed via entry queries, with the goal of minimizing the number of queries to the entries of

. When

is Lipschitz, previous results only obtain constant-factor approximations. This work presents the first algorithm that provides a

-approximation solution by querying

entries of

. This query complexity is also shown to be optimal up to logarithmic factors for

and the

-dependence of

is shown to be optimal for

Near-optimal Active Regression of Single-Index Models

TL;DR

Abstract

Near-optimal Active Regression of Single-Index Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (50)