Breaking the Curse of Dimensionality with Convex Neural Networks
Francis Bach
TL;DR
The paper develops a convex-optimization framework for single-hidden-layer neural networks with non-decreasing positively homogeneous activations, enabling learning from a continuum of basis functions. By introducing the variation norm $\\gamma_1$ and its associated convex geometry, it proves adaptive generalization bounds that exploit low-dimensional linear structure, and shows how high-dimensional nonlinear variable selection can arise under $\\ell_1$ input-weight penalties. It also builds a RKHS connection via $\\gamma_2$, enabling kernel-based approximations, while highlighting fundamental hardness results for the incremental FW steps and offering convex relaxations (e.g., SDP, sign-vector) that preserve approximation guarantees in some regimes. Overall, the work provides deep theoretical insight into approximation, generalization, and computational aspects of convex neural networks, with explicit rates that depend on the intrinsic data subspace, and establishes a clear trade-off between adaptivity and tractability.
Abstract
We consider neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units. By letting the number of hidden units grow unbounded and using classical non-Euclidean regularization tools on the output weights, we provide a detailed theoretical analysis of their generalization performance, with a study of both the approximation and the estimation errors. We show in particular that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace. Moreover, when using sparsity-inducing norms on the input weights, we show that high-dimensional non-linear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of ob-servations. In addition, we provide a simple geometric interpretation to the non-convex problem of addition of a new unit, which is the core potentially hard computational element in the framework of learning from continuously many basis functions. We provide simple conditions for convex relaxations to achieve the same generalization error bounds, even when constant-factor approxi-mations cannot be found (e.g., because it is NP-hard such as for the zero-homogeneous activation function). We were not able to find strong enough convex relaxations and leave open the existence or non-existence of polynomial-time algorithms.
