A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features
Emi Zeger, Yifei Wang, Aaron Mishkin, Tolga Ergen, Emmanuel Candès, Mert Pilanci
TL;DR
The paper shows that training deep neural networks on 1-D data can be recast as a convex Lasso problem with an explicit dictionary, enabling global-optimality analysis and tractable solution paths. The dictionary grows with depth and encodes piecewise-linear features, with reflection features appearing for ReLU and absolute value activations once depth is at least three; sign and threshold activations, in contrast, yield dictionaries lacking reflections. It provides concrete dictionaries for 2-layer and deeper architectures, reconstruction maps from Lasso solutions to optimal networks, and polynomial or combinatorial bounds on dictionary sizes. Empirically, the convex reformulation via Lasso (cvxNN) shows competitive training loss and generalization in autoregressive time-series tasks, corroborating the theoretical predictions and offering a scalable training paradigm for low-dimensional data.
Abstract
We prove that training neural networks on 1-D data is equivalent to solving convex Lasso problems with discrete, explicitly defined dictionary matrices. We consider neural networks with piecewise linear activations and depths ranging from 2 to an arbitrary but finite number of layers. We first show that two-layer networks with piecewise linear activations are equivalent to Lasso models using a discrete dictionary of ramp functions, with breakpoints corresponding to the training data points. In certain general architectures with absolute value or ReLU activations, a third layer surprisingly creates features that reflect the training data about themselves. Additional layers progressively generate reflections of these reflections. The Lasso representation provides valuable insights into the analysis of globally optimal networks, elucidating their solution landscapes and enabling closed-form solutions in certain special cases. Numerical results show that reflections also occur when optimizing standard deep networks using standard non-convex optimizers. Additionally, we demonstrate our theory with autoregressive time series models.
