Table of Contents
Fetching ...

Distribution-Free Rates in Neyman-Pearson Classification

Mohammadreza M. Kalan, Samory Kpotufe

TL;DR

This paper characterizes distribution-free minimax rates for Neyman-Pearson classification over a fixed VC class $\mathcal{H}$ by introducing a geometric three-points-separation condition that separates easy vs. hard rate regimes. When $\mathcal{H}$ separates three points, the minimax excess risk decays as $\tilde{\Theta}(n^{-1/2})$ (up to logarithmic factors); otherwise, rates improve to $\tilde{\Theta}(d_{\mathcal{H}}/n)$ or even zero, depending on the existence of a maximal element in $\mathcal{H}_{\alpha}(\mu_0)$. The results extend to the unknown $\mu_0$ case with slack $\epsilon_0$ and approximate level $\alpha$, preserving the same dichotomy under additional technical conditions (e.g., finitely supported $\mu_0$). The upper and lower bounds match up to $\log n$ terms, highlighting a fundamental link between hypothesis-class structure and achievable rates in distribution-free Neyman-Pearson problems.

Abstract

We consider the problem of Neyman-Pearson classification which models unbalanced classification settings where error w.r.t. a distribution $μ_1$ is to be minimized subject to low error w.r.t. a different distribution $μ_0$. Given a fixed VC class $\mathcal{H}$ of classifiers to be minimized over, we provide a full characterization of possible distribution-free rates, i.e., minimax rates over the space of all pairs $(μ_0, μ_1)$. The rates involve a dichotomy between hard and easy classes $\mathcal{H}$ as characterized by a simple geometric condition, a three-points-separation condition, loosely related to VC dimension.

Distribution-Free Rates in Neyman-Pearson Classification

TL;DR

This paper characterizes distribution-free minimax rates for Neyman-Pearson classification over a fixed VC class by introducing a geometric three-points-separation condition that separates easy vs. hard rate regimes. When separates three points, the minimax excess risk decays as (up to logarithmic factors); otherwise, rates improve to or even zero, depending on the existence of a maximal element in . The results extend to the unknown case with slack and approximate level , preserving the same dichotomy under additional technical conditions (e.g., finitely supported ). The upper and lower bounds match up to terms, highlighting a fundamental link between hypothesis-class structure and achievable rates in distribution-free Neyman-Pearson problems.

Abstract

We consider the problem of Neyman-Pearson classification which models unbalanced classification settings where error w.r.t. a distribution is to be minimized subject to low error w.r.t. a different distribution . Given a fixed VC class of classifiers to be minimized over, we provide a full characterization of possible distribution-free rates, i.e., minimax rates over the space of all pairs . The rates involve a dichotomy between hard and easy classes as characterized by a simple geometric condition, a three-points-separation condition, loosely related to VC dimension.
Paper Structure (14 sections, 13 theorems, 37 equations, 3 figures)

This paper contains 14 sections, 13 theorems, 37 equations, 3 figures.

Key Result

Proposition 1

For any value of VC dimension $d_\mathcal{H}$ in $\left\{1, 2\right\}$, there exist hypothesis classes $\mathcal{H}$ that satisfy three-points-separation and some that do not.

Figures (3)

  • Figure 1:
  • Figure 2: Illustration of $\Omega (1/\sqrt{n})$ lower-bound construction in Lemma \ref{['4.3']}. Letting $d\approx d_\mathcal{H}$, each $h\in \mathcal{H}_\alpha(\mu_0)$ picks (i.e, is 1 on) at most $d/2$ points out of $\left\{x_i, x_i'\right\}_{i=1}^{d/2}$. The points $(x_i, x_i')$ are paired so that, effectively, the learner's choice of $h\in \mathcal{H}_{\alpha}(\mu_0)$ reduces to deciding for each $i$, which of $x_i$ or $x_i'$ has the highest $\mu_1$-mass (randomized in $\sigma_i$). Note that this construction requires three-points-separation:$\exists h,, h'$'s in $\mathcal{H}_\alpha(\mu_0)$, both $0$ on $x_0$, but differing on $x_i, x_i'$ for some $i$.
  • Figure 3: Lower-bound constructions for cases (ii.a) of Theorems \ref{['thm1']} and \ref{['thm2']}. Theorem \ref{['thm2']} (ii.a), starts with a reduction: for any $\mathcal{H}_{\alpha+\epsilon_0} (\mu_0)$ with no maximal element, we construct a new $\mu_0'$ such that $\mathcal{H}_{\alpha+\epsilon_0} (\mu_0') = \mathcal{H}_{\alpha} (\mu'_0)$ and also has no maximal element. Subfigure (a): the main idea is to move $\mu_0$-mass out of regions $\left\{h=1\right\}$, $h \in \mathcal{H}_{\alpha+\epsilon_0} (\mu_0) \setminus \mathcal{H}_{\alpha} (\mu_0)$ carefully to ensure that $\mathcal{H}_{\alpha+\epsilon_0} (\mu_0') = \mathcal{H}_{\alpha+\epsilon_0} (\mu_0)$, i.e., without reducing the mass of regions outside $\mathcal{H}_{\alpha + \epsilon_0}(\mu_0)$. These involves technical corner cases handled in Lemma \ref{['lemma4.8']}. Subfigure (b): A family of distributions $\mu_1$'s can then be defined over $\mathcal{H}_{\alpha}(\mu_0')$, that all put the bulk of their mass on a set $A_0 \doteq \left\{h_0=1\right\}$ for some $h_0 \in \mathcal{H}_{\alpha}(\mu_0')$, but differ in where the put the remaining mass of order $n^{-1}$. The learner has to identify where the remaining mass resides.

Theorems & Definitions (31)

  • Definition 1
  • Definition 2: vapnik2015uniform
  • Definition 3
  • Definition 4
  • Proposition 1
  • Remark 1: Examples for $d_\mathcal{H} < 3$.
  • Definition 5
  • Theorem 1
  • Example 1
  • Example 2
  • ...and 21 more