Table of Contents
Fetching ...

A signal separation view of classification

H. N. Mhaskar, Ryan O'Dowd

Abstract

The problem of classification in machine learning has often been approached in terms of function approximation. In this paper, we propose an alternative approach for classification in arbitrary compact metric spaces which, in theory, yields both the number of classes, and a perfect classification using a minimal number of queried labels. Our approach uses localized trigonometric polynomial kernels initially developed for the point source signal separation problem in signal processing. Rather than point sources, we argue that the various classes come from different probability measures. The localized kernel technique developed for separating point sources is then shown to separate the supports of these distributions. This is done in a hierarchical manner in our MASC algorithm to accommodate touching/overlapping class boundaries. We illustrate our theory on several simulated and real life datasets, including the Salinas and Indian Pines hyperspectral datasets and a document dataset.

A signal separation view of classification

Abstract

The problem of classification in machine learning has often been approached in terms of function approximation. In this paper, we propose an alternative approach for classification in arbitrary compact metric spaces which, in theory, yields both the number of classes, and a perfect classification using a minimal number of queried labels. Our approach uses localized trigonometric polynomial kernels initially developed for the point source signal separation problem in signal processing. Rather than point sources, we argue that the various classes come from different probability measures. The localized kernel technique developed for separating point sources is then shown to separate the supports of these distributions. This is done in a hierarchical manner in our MASC algorithm to accommodate touching/overlapping class boundaries. We illustrate our theory on several simulated and real life datasets, including the Salinas and Indian Pines hyperspectral datasets and a document dataset.

Paper Structure

This paper contains 21 sections, 15 theorems, 107 equations, 18 figures, 4 tables, 1 algorithm.

Key Result

Theorem 5.1

Let $\mu$ be detectable and suppose $M\gtrsim n^{\alpha}\log(n)$. Let $\{x_1,x_2,\dots,x_M\}$ be independent samples from $\mu$. There exists a constant $C>0$ such that if $\Theta<C<1$, then there exists $r(\Theta)\sim \Theta^{-1/(S-\alpha)}$ (recall $S$ is the localization parameter of the kernel,

Figures (18)

  • Figure 1: Visualization of our main theorem. Top: Supports of two classes with no minimal separation. Bottom: The two classes are separated into sets $\mathbf{S}_{1,\eta},\mathbf{S}_{2,\eta}$ (blue and green) with separation $2\eta$ by removing a remainder set $\mathbf{S}_{3,\eta}$ (red). Our theorem gives conditions for when our support estimation sets $\mathcal{G}_{1,\eta,n}(\Theta),\mathcal{G}_{1,\eta,n}$ (light blue and light green) have separation $\eta$ and are close estimations of $\mathbf{S}_{1,\eta},\mathbf{S}_{2,\eta}$ respectively.
  • Figure 2: Left: $|\sigma_{256}(\mu)(x)|$ has peaks at the points $-1, 2, 2.05$, and is small everywhere else. Vertical red lines indicate the positions of these points. Right: A close-up view of $|\sigma_{256}(\mu)(x)|$ near $x=2$ to show an accurate detection of the close-by points $2, 2.05$. Bottom: A close-up view of $|\sigma_{64}(\mu)(x)|$ near $x=2$ to show the non-detection of the close-by points $2, 2.05$.
  • Figure 3: Normalized histogram of the density of interest (left), paired with our density estimation by $\sigma_{128}$ based on $3900$ samples (right).
  • Figure 4: Demonstration of the support estimation set $\mathcal{G}_{32}(0.15)$ (yellow) applied to a simple two-moons data set from twomoons (blue and red). By querying one point from each component of the support estimation set and extending the label to the other points in the same component, we can classify the entire data set with 100% accuracy.
  • Figure 5: This figure illustrates the result of applying MASC to a synthetic circle and ellipse data set. On the left are true labels of the given data, and on the right is the estimation attained by MASC.
  • ...and 13 more figures

Theorems & Definitions (36)

  • Example 3.1
  • Remark 3.1
  • Example 3.2
  • Definition 4.1
  • Definition 4.2
  • Remark 4.1
  • Example 4.1
  • Example 4.2
  • Remark 4.2
  • Remark 5.1
  • ...and 26 more