Component-based Sketching for Deep ReLU Nets

Di Wang; Shao-Bo Lin; Deyu Meng; Feilong Cao

Component-based Sketching for Deep ReLU Nets

Di Wang, Shao-Bo Lin, Deyu Meng, Feilong Cao

TL;DR

This work tackles the optimization-generalization mismatch in deep ReLU nets by introducing component-based sketching, which builds a basis from depth-exploiting components (locality, square, and product) and uses dimension leverage via minimal-energy points to construct a linear hypothesis space. Training then reduces to linear ERM on this basis, circumventing nonconvex optimization and achieving almost rate-optimal generalization while mitigating the saturation phenomenon. Theoretical results establish expressive capacity and near-optimal generalization bounds, and extensive experiments on synthetic and real-world data show superior generalization with competitive or reduced training costs compared to gradient-based methods. The approach provides a scalable, interpretable alternative to neuron-centric optimization, with practical impact for high-dimensional learning tasks where depth and smoothness play a key role.

Abstract

Deep learning has made profound impacts in the domains of data mining and AI, distinguished by the groundbreaking achievements in numerous real-world applications and the innovative algorithm design philosophy. However, it suffers from the inconsistency issue between optimization and generalization, as achieving good generalization, guided by the bias-variance trade-off principle, favors under-parameterized networks, whereas ensuring effective convergence of gradient-based algorithms demands over-parameterized networks. To address this issue, we develop a novel sketching scheme based on deep net components for various tasks. Specifically, we use deep net components with specific efficacy to build a sketching basis that embodies the advantages of deep networks. Subsequently, we transform deep net training into a linear empirical risk minimization problem based on the constructed basis, successfully avoiding the complicated convergence analysis of iterative algorithms. The efficacy of the proposed component-based sketching is validated through both theoretical analysis and numerical experiments. Theoretically, we show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions for shallow nets and also achieves almost optimal generalization error bounds. Numerically, we demonstrate that, compared with the existing gradient-based training methods, component-based sketching possesses superior generalization performance with reduced training costs.

Component-based Sketching for Deep ReLU Nets

TL;DR

Abstract

Paper Structure (17 sections, 14 theorems, 82 equations, 18 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 14 theorems, 82 equations, 18 figures, 5 tables, 1 algorithm.

Introduction
Tug of war between optimization and generalization
Motivations and road-map
Main contributions
Construction of Deep Net Components
Power of depth of deep nets
Construction of deep net components
Component-based Sketching for Deep ReLU Nets
Construction of sketching basis via dimension leverage
Component-based sketching for deep ReLU nets
Theoretical Behaviors
Expressivity of the sketching basis
Generalization error for the component-based sketching algorithm
Experimental Results
Synthetic Results
...and 2 more sections

Key Result

Proposition 1

Let $\sigma$ be the ReLU function, $t_k=-1/2+k/n$, and $T_{\tau,t_{k-1},t_k}$ be defined in Localized-identifier. Then,

Figures (18)

Figure 1: Tug of war between optimization and generalization
Figure 2: Flow of component-based training schemes
Figure 3: Flow of component-based sketching schemes
Figure 4: Cubic locality versus cone-type locality
Figure 5: Road-map of the component-based sketching schemes for deep nets
...and 13 more figures

Theorems & Definitions (15)

Definition 1
Proposition 1
Proposition 2
Proposition 3
Lemma 1
Lemma 2
Proposition 4
Theorem 1
Theorem 2
Theorem 3
...and 5 more

Component-based Sketching for Deep ReLU Nets

TL;DR

Abstract

Component-based Sketching for Deep ReLU Nets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (15)