Table of Contents
Fetching ...

Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach

Yinan Li, Chicheng Zhang

TL;DR

This work addresses efficient active learning of $d$-dimensional homogeneous halfspaces under the $(A,\alpha)$-Tsybakov noise condition with well-behaved unlabeled distributions. It introduces a nonconvex objective $L_\sigma$ whose approximate first-order stationary points suffice to recover a near-optimal halfspace, and pairs this with a label-efficient active oracle to achieve a label complexity of $\tilde{O}\left(d (\frac{1}{\epsilon})^{\frac{8-6\alpha}{3\alpha-1}}\right)$ for $\alpha\in(\tfrac{1}{3},1]$. The method combines (i) efficient nonconvex optimization (Active-PSGD), (ii) a label-efficient iterate selection that uses gradient estimates, and (iii) a final label-efficient validation to select between $\pm\hat w$, yielding a provable $(\epsilon,\delta)$-PAC guarantee. This advances the state of the art by expanding the eligible noise range and improving label efficiency, narrowing the gap to information-theoretic bounds and outperforming prior efficient active algorithms in the same regime.

Abstract

We study the problem of computationally and label efficient PAC active learning $d$-dimensional halfspaces with Tsybakov Noise~\citep{tsybakov2004optimal} under structured unlabeled data distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In light of the above structural result, we design a nonconvex optimization-based algorithm with a label complexity of $\tilde{O}(d (\frac{1}ε)^{\frac{8-6α}{3α-1}})$, under the assumption that the Tsybakov noise parameter $α\in (\frac13, 1]$, which narrows down the gap between the label complexities of the previously known efficient passive or active algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the information-theoretic lower bound in this setting.

Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach

TL;DR

This work addresses efficient active learning of -dimensional homogeneous halfspaces under the -Tsybakov noise condition with well-behaved unlabeled distributions. It introduces a nonconvex objective whose approximate first-order stationary points suffice to recover a near-optimal halfspace, and pairs this with a label-efficient active oracle to achieve a label complexity of for . The method combines (i) efficient nonconvex optimization (Active-PSGD), (ii) a label-efficient iterate selection that uses gradient estimates, and (iii) a final label-efficient validation to select between , yielding a provable -PAC guarantee. This advances the state of the art by expanding the eligible noise range and improving label efficiency, narrowing the gap to information-theoretic bounds and outperforming prior efficient active algorithms in the same regime.

Abstract

We study the problem of computationally and label efficient PAC active learning -dimensional halfspaces with Tsybakov Noise~\citep{tsybakov2004optimal} under structured unlabeled data distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In light of the above structural result, we design a nonconvex optimization-based algorithm with a label complexity of , under the assumption that the Tsybakov noise parameter , which narrows down the gap between the label complexities of the previously known efficient passive or active algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the information-theoretic lower bound in this setting.
Paper Structure (23 sections, 22 theorems, 76 equations, 1 table, 3 algorithms)

This paper contains 23 sections, 22 theorems, 76 equations, 1 table, 3 algorithms.

Key Result

Lemma 4

Let $D_X$ be a well behaved distribution, and $D$ satisfies $(A, \alpha)$-TNC. Denote by $L_\sigma(w) = \mathbb{E}_D \left[\phi_\sigma \left(y \frac{\left\langle w,x \right\rangle }{\|w\|_2}\right)\right]$ where $\phi_\sigma$ is softmax loss defined above. Let $w$ be such that $\theta(w, w^*) \in (\

Theorems & Definitions (31)

  • Definition 1: Tsybakov noise condition
  • Definition 2: Well-behaved distributions diakonikolas2020polynomial
  • Definition 3
  • Lemma 4
  • Lemma 5
  • Remark 6
  • Remark 7
  • Remark 8
  • Lemma 9
  • Lemma 10
  • ...and 21 more