Component-based Sketching for Deep ReLU Nets
Di Wang, Shao-Bo Lin, Deyu Meng, Feilong Cao
TL;DR
This work tackles the optimization-generalization mismatch in deep ReLU nets by introducing component-based sketching, which builds a basis from depth-exploiting components (locality, square, and product) and uses dimension leverage via minimal-energy points to construct a linear hypothesis space. Training then reduces to linear ERM on this basis, circumventing nonconvex optimization and achieving almost rate-optimal generalization while mitigating the saturation phenomenon. Theoretical results establish expressive capacity and near-optimal generalization bounds, and extensive experiments on synthetic and real-world data show superior generalization with competitive or reduced training costs compared to gradient-based methods. The approach provides a scalable, interpretable alternative to neuron-centric optimization, with practical impact for high-dimensional learning tasks where depth and smoothness play a key role.
Abstract
Deep learning has made profound impacts in the domains of data mining and AI, distinguished by the groundbreaking achievements in numerous real-world applications and the innovative algorithm design philosophy. However, it suffers from the inconsistency issue between optimization and generalization, as achieving good generalization, guided by the bias-variance trade-off principle, favors under-parameterized networks, whereas ensuring effective convergence of gradient-based algorithms demands over-parameterized networks. To address this issue, we develop a novel sketching scheme based on deep net components for various tasks. Specifically, we use deep net components with specific efficacy to build a sketching basis that embodies the advantages of deep networks. Subsequently, we transform deep net training into a linear empirical risk minimization problem based on the constructed basis, successfully avoiding the complicated convergence analysis of iterative algorithms. The efficacy of the proposed component-based sketching is validated through both theoretical analysis and numerical experiments. Theoretically, we show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions for shallow nets and also achieves almost optimal generalization error bounds. Numerically, we demonstrate that, compared with the existing gradient-based training methods, component-based sketching possesses superior generalization performance with reduced training costs.
