A result relating convex n-widths to covering numbers with some applications to neural networks

Jonathan Baxter; Peter Bartlett

A result relating convex n-widths to covering numbers with some applications to neural networks

Jonathan Baxter, Peter Bartlett

TL;DR

This paper investigates when high-dimensional function classes admit low-dimensional representations via small feature sets. It introduces the convex core and its ε-covering number N_co(ε,K) as a unifying measure linking approximation error to combinatorial complexity. The central result shows c_n(K) ≤ ε whenever n ≥ N_co(ε,K) and that the bound is tight up to a gap of 1, with Sobolev-space and other examples illustrating limits. Applying this to one-hidden-layer neural networks, the authors derive practical upper bounds on approximation rates for node classes, including VC-classes, linear threshold, and smoothly parameterized families.

Abstract

In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or ``features'' is known to be hard. Typically, the worst-case error of the best basis set decays only as fast as $Θ$n^{-1/d}$$, where $n$ is the number of basis functions and $d$ is the input dimension. However, there are many examples of high-dimensional pattern recognition problems (such as face recognition) where linear combinations of small sets of features do solve the problem well. Hence these function classes do not suffer from the ``curse of dimensionality'' associated with more general classes. It is natural then, to look for characterizations of high-dimensional function classes that nevertheless are approximated well by linear combinations of small sets of features. In this paper we give a general result relating the error of approximation of a function class to the covering number of its ``convex core''. For one-hidden-layer neural networks, covering numbers of the class of functions computed by a single hidden node upper bound the covering numbers of the convex core. Hence, using standard results we obtain upper bounds on the approximation rate of neural network classes.

A result relating convex n-widths to covering numbers with some applications to neural networks

TL;DR

Abstract

, where

is the number of basis functions and

is the input dimension. However, there are many examples of high-dimensional pattern recognition problems (such as face recognition) where linear combinations of small sets of features do solve the problem well. Hence these function classes do not suffer from the ``curse of dimensionality'' associated with more general classes. It is natural then, to look for characterizations of high-dimensional function classes that nevertheless are approximated well by linear combinations of small sets of features. In this paper we give a general result relating the error of approximation of a function class to the covering number of its ``convex core''. For one-hidden-layer neural networks, covering numbers of the class of functions computed by a single hidden node upper bound the covering numbers of the convex core. Hence, using standard results we obtain upper bounds on the approximation rate of neural network classes.

A result relating convex n-widths to covering numbers with some applications to neural networks

TL;DR

Abstract

A result relating convex n-widths to covering numbers with some applications to neural networks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (8)