Table of Contents
Fetching ...

Near-optimal Active Regression of Single-Index Models

Yi Li, Wai Ming Tai

TL;DR

This work studies active regression for single-index models with a Lipschitz activation $f$, where $A$ is fully accessible but the labels $b$ are accessible only via entry queries. The authors design a sample-then-solve strategy augmented with a regularized objective $\min_x \|S(f(Ax)-b)\|_p^p + \varepsilon\|Ax\|_p^p$, use domain shrinking, and deploy two-stage sampling to achieve a $(1+\varepsilon)$-approximation with query complexity $\tilde{O}(d^{\frac{p}{2}\vee 1}/\varepsilon^{p\vee 2})$, tight up to polylog factors for $1\le p\le 2$ and showing optimal $1/\varepsilon^p$ dependence for $p>2$. The analysis combines Lewis-weight-based subspace embeddings, Dudley’s integral, and careful concentration arguments to bound the sampling error uniformly over a controllable domain. A lower-bound framework based on Yao’s minimax theorem establishes near-tight limits, while a technique to remove dependence on the ambient dimension $n$ yields query-efficient results that scale primarily with the dimension $d$ and accuracy $\varepsilon$. Overall, the paper advances the state of the art in near-optimal query-efficient single-index regression under nonconvex, Lipschitz nonlinearities with provable guarantees.

Abstract

The active regression problem of the single-index model is to solve $\min_x \lVert f(Ax)-b\rVert_p$, where $A$ is fully accessible and $b$ can only be accessed via entry queries, with the goal of minimizing the number of queries to the entries of $b$. When $f$ is Lipschitz, previous results only obtain constant-factor approximations. This work presents the first algorithm that provides a $(1+\varepsilon)$-approximation solution by querying $\tilde{O}(d^{\frac{p}{2}\vee 1}/\varepsilon^{p\vee 2})$ entries of $b$. This query complexity is also shown to be optimal up to logarithmic factors for $p\in [1,2]$ and the $\varepsilon$-dependence of $1/\varepsilon^p$ is shown to be optimal for $p>2$.

Near-optimal Active Regression of Single-Index Models

TL;DR

This work studies active regression for single-index models with a Lipschitz activation , where is fully accessible but the labels are accessible only via entry queries. The authors design a sample-then-solve strategy augmented with a regularized objective , use domain shrinking, and deploy two-stage sampling to achieve a -approximation with query complexity , tight up to polylog factors for and showing optimal dependence for . The analysis combines Lewis-weight-based subspace embeddings, Dudley’s integral, and careful concentration arguments to bound the sampling error uniformly over a controllable domain. A lower-bound framework based on Yao’s minimax theorem establishes near-tight limits, while a technique to remove dependence on the ambient dimension yields query-efficient results that scale primarily with the dimension and accuracy . Overall, the paper advances the state of the art in near-optimal query-efficient single-index regression under nonconvex, Lipschitz nonlinearities with provable guarantees.

Abstract

The active regression problem of the single-index model is to solve , where is fully accessible and can only be accessed via entry queries, with the goal of minimizing the number of queries to the entries of . When is Lipschitz, previous results only obtain constant-factor approximations. This work presents the first algorithm that provides a -approximation solution by querying entries of . This query complexity is also shown to be optimal up to logarithmic factors for and the -dependence of is shown to be optimal for .

Paper Structure

This paper contains 43 sections, 30 theorems, 288 equations, 2 figures, 3 algorithms.

Key Result

Theorem 1

There is a randomized algorithm, when given $A\in \mathbb{R}^{n\times d}$, $b\in \mathbb{R}^n$, $f\in \mathsf{Lip}_L$ and an arbitrary sufficient small $\varepsilon > 0$, with probability at least $0.9$, makes $O(d^{1\vee\frac{p}{2}}/\varepsilon^{2\vee p} \cdot \mathop{\mathrm{\mathsf{poly}}}\nolimi The hidden constant in the bound on number of queries depends on $p$ only.

Figures (2)

  • Figure 1: (left) Plot of the locus $f(x-x)$, where the red (resp. blue) part corresponds to $x\leq 0$ (resp. $x\geq 0$); (middle) $f(-66)$ is the point on the red part that is closest to $u$ and $v$ in the $\ell_p$-distance; (right) $f(6-6)$ is the point on the blue part that is closest to $u$ and $v$ in the $\ell_p$-distance
  • Figure 2: Illustration of the locus $\gamma$ (left), the minimizers when $x\leq 0$ (middle) and the minimizer when $x\geq 0$ (right)

Theorems & Definitions (50)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Definition 4: $\ell_p$-Lewis weights
  • Lemma 5: Properties of Lewis weights
  • Lemma 6
  • Lemma 7: Dudley's integral Vers18
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 40 more