A result relating convex n-widths to covering numbers with some applications to neural networks
Jonathan Baxter, Peter Bartlett
TL;DR
This paper investigates when high-dimensional function classes admit low-dimensional representations via small feature sets. It introduces the convex core and its ε-covering number N_co(ε,K) as a unifying measure linking approximation error to combinatorial complexity. The central result shows c_n(K) ≤ ε whenever n ≥ N_co(ε,K) and that the bound is tight up to a gap of 1, with Sobolev-space and other examples illustrating limits. Applying this to one-hidden-layer neural networks, the authors derive practical upper bounds on approximation rates for node classes, including VC-classes, linear threshold, and smoothly parameterized families.
Abstract
In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or ``features'' is known to be hard. Typically, the worst-case error of the best basis set decays only as fast as $Θ\(n^{-1/d}\)$, where $n$ is the number of basis functions and $d$ is the input dimension. However, there are many examples of high-dimensional pattern recognition problems (such as face recognition) where linear combinations of small sets of features do solve the problem well. Hence these function classes do not suffer from the ``curse of dimensionality'' associated with more general classes. It is natural then, to look for characterizations of high-dimensional function classes that nevertheless are approximated well by linear combinations of small sets of features. In this paper we give a general result relating the error of approximation of a function class to the covering number of its ``convex core''. For one-hidden-layer neural networks, covering numbers of the class of functions computed by a single hidden node upper bound the covering numbers of the convex core. Hence, using standard results we obtain upper bounds on the approximation rate of neural network classes.
