Table of Contents
Fetching ...

A result relating convex n-widths to covering numbers with some applications to neural networks

Jonathan Baxter, Peter Bartlett

TL;DR

This paper investigates when high-dimensional function classes admit low-dimensional representations via small feature sets. It introduces the convex core and its ε-covering number N_co(ε,K) as a unifying measure linking approximation error to combinatorial complexity. The central result shows c_n(K) ≤ ε whenever n ≥ N_co(ε,K) and that the bound is tight up to a gap of 1, with Sobolev-space and other examples illustrating limits. Applying this to one-hidden-layer neural networks, the authors derive practical upper bounds on approximation rates for node classes, including VC-classes, linear threshold, and smoothly parameterized families.

Abstract

In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or ``features'' is known to be hard. Typically, the worst-case error of the best basis set decays only as fast as $Θ\(n^{-1/d}\)$, where $n$ is the number of basis functions and $d$ is the input dimension. However, there are many examples of high-dimensional pattern recognition problems (such as face recognition) where linear combinations of small sets of features do solve the problem well. Hence these function classes do not suffer from the ``curse of dimensionality'' associated with more general classes. It is natural then, to look for characterizations of high-dimensional function classes that nevertheless are approximated well by linear combinations of small sets of features. In this paper we give a general result relating the error of approximation of a function class to the covering number of its ``convex core''. For one-hidden-layer neural networks, covering numbers of the class of functions computed by a single hidden node upper bound the covering numbers of the convex core. Hence, using standard results we obtain upper bounds on the approximation rate of neural network classes.

A result relating convex n-widths to covering numbers with some applications to neural networks

TL;DR

This paper investigates when high-dimensional function classes admit low-dimensional representations via small feature sets. It introduces the convex core and its ε-covering number N_co(ε,K) as a unifying measure linking approximation error to combinatorial complexity. The central result shows c_n(K) ≤ ε whenever n ≥ N_co(ε,K) and that the bound is tight up to a gap of 1, with Sobolev-space and other examples illustrating limits. Applying this to one-hidden-layer neural networks, the authors derive practical upper bounds on approximation rates for node classes, including VC-classes, linear threshold, and smoothly parameterized families.

Abstract

In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or ``features'' is known to be hard. Typically, the worst-case error of the best basis set decays only as fast as , where is the number of basis functions and is the input dimension. However, there are many examples of high-dimensional pattern recognition problems (such as face recognition) where linear combinations of small sets of features do solve the problem well. Hence these function classes do not suffer from the ``curse of dimensionality'' associated with more general classes. It is natural then, to look for characterizations of high-dimensional function classes that nevertheless are approximated well by linear combinations of small sets of features. In this paper we give a general result relating the error of approximation of a function class to the covering number of its ``convex core''. For one-hidden-layer neural networks, covering numbers of the class of functions computed by a single hidden node upper bound the covering numbers of the convex core. Hence, using standard results we obtain upper bounds on the approximation rate of neural network classes.

Paper Structure

This paper contains 8 sections, 5 theorems, 31 equations.

Key Result

Theorem 1

For any set $K\subseteq X$ and for all ${\varepsilon} >0$,

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Lemma 2
  • Theorem 3
  • Lemma 4
  • Lemma 5