Table of Contents
Fetching ...

Online Consistency of the Nearest Neighbor Rule

Sanjoy Dasgupta, Geelon So

TL;DR

It is proved online consistency for all measurable functions in doubling metric spaces under the mild assumption that the instances are generated by a process that is uniformly absolutely continuous with respect to a finite, upper doubling measure.

Abstract

In the realizable online setting, a learner is tasked with making predictions for a stream of instances, where the correct answer is revealed after each prediction. A learning rule is online consistent if its mistake rate eventually vanishes. The nearest neighbor rule (Fix and Hodges, 1951) is a fundamental prediction strategy, but it is only known to be consistent under strong statistical or geometric assumptions: the instances come i.i.d. or the label classes are well-separated. We prove online consistency for all measurable functions in doubling metric spaces under the mild assumption that the instances are generated by a process that is uniformly absolutely continuous with respect to a finite, upper doubling measure.

Online Consistency of the Nearest Neighbor Rule

TL;DR

It is proved online consistency for all measurable functions in doubling metric spaces under the mild assumption that the instances are generated by a process that is uniformly absolutely continuous with respect to a finite, upper doubling measure.

Abstract

In the realizable online setting, a learner is tasked with making predictions for a stream of instances, where the correct answer is revealed after each prediction. A learning rule is online consistent if its mistake rate eventually vanishes. The nearest neighbor rule (Fix and Hodges, 1951) is a fundamental prediction strategy, but it is only known to be consistent under strong statistical or geometric assumptions: the instances come i.i.d. or the label classes are well-separated. We prove online consistency for all measurable functions in doubling metric spaces under the mild assumption that the instances are generated by a process that is uniformly absolutely continuous with respect to a finite, upper doubling measure.

Paper Structure

This paper contains 34 sections, 36 theorems, 115 equations, 3 figures, 1 algorithm.

Key Result

Proposition 2.1

Let $(\mathcal{X}, \rho)$ be a totally bounded metric space. Given $\eta : \mathcal{X} \to \mathcal{Y}$, there is a sequence of instances $(X_n)_n$ on which the nearest neighbor rule is not online consistent on $\eta$ if and only if there is no positive separation between classes:

Figures (3)

  • Figure 1: Learning the threshold $\mathbbm{1}\{x \geq 0\}$ on $\mathbb{R}$. The nearest neighbor classifier makes a mistake every single round on the sequence $X_n = (-1/3)^n$, where subsequent test points alternate sign.
  • Figure 2: (Left) A visualization of a mutually-labeling set (orange ball). There are two classes (dark and light gray) separated by a decision boundary (blue line). The length of the dashed line measures the margin of a point in the set. The length of the dotted line is bounded above by the diameter of the set. (Right) An example of a collection of mutually-labeling sets (dark and light blue balls) covering all but a region of small mass. Since the nearest neighbor rule makes at most one mistake per ball, eventually all mistakes must come from the white, uncovered region.
  • Figure 3: The left, middle, and right figures shows the cover trees for the first three indicated instance $(X_{\tau_1}, X_{\tau_2}, X_{\tau_3})$, along with their metric and measure bound trade offs. Each concentric disk corresponds to a ball in the cover tree, and the orange disk corresponds to a tail for an indicated instance. The tails are chosen so that the $\nu$-mass of the orange region remains bounded by $\delta$.

Theorems & Definitions (76)

  • Example 2.1: Failing to learn a threshold
  • Proposition 2.1: Non-convergence in the worst-case
  • Definition 3.1: Ergodic continuity
  • Definition 3.2: Uniform absolute continuity
  • Lemma 3.2
  • Definition 4.1: Boundary point
  • Theorem 4.2: Online consistency for $\mathcal{F}_0$
  • Definition 4.3: Mutually-labeling set
  • Lemma 4.3
  • Lemma 4.3
  • ...and 66 more