Distribution-Free Rates in Neyman-Pearson Classification

Mohammadreza M. Kalan; Samory Kpotufe

Distribution-Free Rates in Neyman-Pearson Classification

Mohammadreza M. Kalan, Samory Kpotufe

TL;DR

This paper characterizes distribution-free minimax rates for Neyman-Pearson classification over a fixed VC class $\mathcal{H}$ by introducing a geometric three-points-separation condition that separates easy vs. hard rate regimes. When $\mathcal{H}$ separates three points, the minimax excess risk decays as $\tilde{\Theta}(n^{-1/2})$ (up to logarithmic factors); otherwise, rates improve to $\tilde{\Theta}(d_{\mathcal{H}}/n)$ or even zero, depending on the existence of a maximal element in $\mathcal{H}_{\alpha}(\mu_0)$. The results extend to the unknown $\mu_0$ case with slack $\epsilon_0$ and approximate level $\alpha$, preserving the same dichotomy under additional technical conditions (e.g., finitely supported $\mu_0$). The upper and lower bounds match up to $\log n$ terms, highlighting a fundamental link between hypothesis-class structure and achievable rates in distribution-free Neyman-Pearson problems.

Abstract

We consider the problem of Neyman-Pearson classification which models unbalanced classification settings where error w.r.t. a distribution $μ_1$ is to be minimized subject to low error w.r.t. a different distribution $μ_0$. Given a fixed VC class $\mathcal{H}$ of classifiers to be minimized over, we provide a full characterization of possible distribution-free rates, i.e., minimax rates over the space of all pairs $(μ_0, μ_1)$. The rates involve a dichotomy between hard and easy classes $\mathcal{H}$ as characterized by a simple geometric condition, a three-points-separation condition, loosely related to VC dimension.

Distribution-Free Rates in Neyman-Pearson Classification

TL;DR

This paper characterizes distribution-free minimax rates for Neyman-Pearson classification over a fixed VC class

by introducing a geometric three-points-separation condition that separates easy vs. hard rate regimes. When

separates three points, the minimax excess risk decays as

(up to logarithmic factors); otherwise, rates improve to

or even zero, depending on the existence of a maximal element in

. The results extend to the unknown

case with slack

and approximate level

, preserving the same dichotomy under additional technical conditions (e.g., finitely supported

). The upper and lower bounds match up to

terms, highlighting a fundamental link between hypothesis-class structure and achievable rates in distribution-free Neyman-Pearson problems.

Abstract

We consider the problem of Neyman-Pearson classification which models unbalanced classification settings where error w.r.t. a distribution

is to be minimized subject to low error w.r.t. a different distribution

. Given a fixed VC class

of classifiers to be minimized over, we provide a full characterization of possible distribution-free rates, i.e., minimax rates over the space of all pairs

. The rates involve a dichotomy between hard and easy classes

as characterized by a simple geometric condition, a three-points-separation condition, loosely related to VC dimension.

Paper Structure (14 sections, 13 theorems, 37 equations, 3 figures)

This paper contains 14 sections, 13 theorems, 37 equations, 3 figures.

Introduction
Setup and Definitions
Main Results
Known $\mu_0$, Exact Level $\alpha$
Unknown $\mu_0$, Approximate Level $\alpha$
Overview of Analysis
Supporting Upper-Bounds
Supporting Lower-Bounds
Proof of Theorem \ref{['thm1']}: Exact Level $\alpha$
Proof of Theorem \ref{['thm2']}: Approximate Level $\alpha$
Proofs
Supporting Results
Proofs of Section \ref{['sec: Generic Upper-Bounds']} (Upper-Bounds)
Proofs of Section \ref{['sec: Minimax Lower-Bounds']} (Lower-Bounds)

Key Result

Proposition 1

For any value of VC dimension $d_\mathcal{H}$ in $\left\{1, 2\right\}$, there exist hypothesis classes $\mathcal{H}$ that satisfy three-points-separation and some that do not.

Figures (3)

Figure 1:
Figure 2: Illustration of $\Omega (1/\sqrt{n})$ lower-bound construction in Lemma \ref{['4.3']}. Letting $d\approx d_\mathcal{H}$, each $h\in \mathcal{H}_\alpha(\mu_0)$ picks (i.e, is 1 on) at most $d/2$ points out of $\left\{x_i, x_i'\right\}_{i=1}^{d/2}$. The points $(x_i, x_i')$ are paired so that, effectively, the learner's choice of $h\in \mathcal{H}_{\alpha}(\mu_0)$ reduces to deciding for each $i$, which of $x_i$ or $x_i'$ has the highest $\mu_1$-mass (randomized in $\sigma_i$). Note that this construction requires three-points-separation:$\exists h,, h'$'s in $\mathcal{H}_\alpha(\mu_0)$, both $0$ on $x_0$, but differing on $x_i, x_i'$ for some $i$.
Figure 3: Lower-bound constructions for cases (ii.a) of Theorems \ref{['thm1']} and \ref{['thm2']}. Theorem \ref{['thm2']} (ii.a), starts with a reduction: for any $\mathcal{H}_{\alpha+\epsilon_0} (\mu_0)$ with no maximal element, we construct a new $\mu_0'$ such that $\mathcal{H}_{\alpha+\epsilon_0} (\mu_0') = \mathcal{H}_{\alpha} (\mu'_0)$ and also has no maximal element. Subfigure (a): the main idea is to move $\mu_0$-mass out of regions $\left\{h=1\right\}$, $h \in \mathcal{H}_{\alpha+\epsilon_0} (\mu_0) \setminus \mathcal{H}_{\alpha} (\mu_0)$ carefully to ensure that $\mathcal{H}_{\alpha+\epsilon_0} (\mu_0') = \mathcal{H}_{\alpha+\epsilon_0} (\mu_0)$, i.e., without reducing the mass of regions outside $\mathcal{H}_{\alpha + \epsilon_0}(\mu_0)$. These involves technical corner cases handled in Lemma \ref{['lemma4.8']}. Subfigure (b): A family of distributions $\mu_1$'s can then be defined over $\mathcal{H}_{\alpha}(\mu_0')$, that all put the bulk of their mass on a set $A_0 \doteq \left\{h_0=1\right\}$ for some $h_0 \in \mathcal{H}_{\alpha}(\mu_0')$, but differ in where the put the remaining mass of order $n^{-1}$. The learner has to identify where the remaining mass resides.

Theorems & Definitions (31)

Definition 1
Definition 2: vapnik2015uniform
Definition 3
Definition 4
Proposition 1
Remark 1: Examples for $d_\mathcal{H} < 3$.
Definition 5
Theorem 1
Example 1
Example 2
...and 21 more

Distribution-Free Rates in Neyman-Pearson Classification

TL;DR

Abstract

Distribution-Free Rates in Neyman-Pearson Classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (31)