Table of Contents
Fetching ...

Component-based Sketching for Deep ReLU Nets

Di Wang, Shao-Bo Lin, Deyu Meng, Feilong Cao

TL;DR

This work tackles the optimization-generalization mismatch in deep ReLU nets by introducing component-based sketching, which builds a basis from depth-exploiting components (locality, square, and product) and uses dimension leverage via minimal-energy points to construct a linear hypothesis space. Training then reduces to linear ERM on this basis, circumventing nonconvex optimization and achieving almost rate-optimal generalization while mitigating the saturation phenomenon. Theoretical results establish expressive capacity and near-optimal generalization bounds, and extensive experiments on synthetic and real-world data show superior generalization with competitive or reduced training costs compared to gradient-based methods. The approach provides a scalable, interpretable alternative to neuron-centric optimization, with practical impact for high-dimensional learning tasks where depth and smoothness play a key role.

Abstract

Deep learning has made profound impacts in the domains of data mining and AI, distinguished by the groundbreaking achievements in numerous real-world applications and the innovative algorithm design philosophy. However, it suffers from the inconsistency issue between optimization and generalization, as achieving good generalization, guided by the bias-variance trade-off principle, favors under-parameterized networks, whereas ensuring effective convergence of gradient-based algorithms demands over-parameterized networks. To address this issue, we develop a novel sketching scheme based on deep net components for various tasks. Specifically, we use deep net components with specific efficacy to build a sketching basis that embodies the advantages of deep networks. Subsequently, we transform deep net training into a linear empirical risk minimization problem based on the constructed basis, successfully avoiding the complicated convergence analysis of iterative algorithms. The efficacy of the proposed component-based sketching is validated through both theoretical analysis and numerical experiments. Theoretically, we show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions for shallow nets and also achieves almost optimal generalization error bounds. Numerically, we demonstrate that, compared with the existing gradient-based training methods, component-based sketching possesses superior generalization performance with reduced training costs.

Component-based Sketching for Deep ReLU Nets

TL;DR

This work tackles the optimization-generalization mismatch in deep ReLU nets by introducing component-based sketching, which builds a basis from depth-exploiting components (locality, square, and product) and uses dimension leverage via minimal-energy points to construct a linear hypothesis space. Training then reduces to linear ERM on this basis, circumventing nonconvex optimization and achieving almost rate-optimal generalization while mitigating the saturation phenomenon. Theoretical results establish expressive capacity and near-optimal generalization bounds, and extensive experiments on synthetic and real-world data show superior generalization with competitive or reduced training costs compared to gradient-based methods. The approach provides a scalable, interpretable alternative to neuron-centric optimization, with practical impact for high-dimensional learning tasks where depth and smoothness play a key role.

Abstract

Deep learning has made profound impacts in the domains of data mining and AI, distinguished by the groundbreaking achievements in numerous real-world applications and the innovative algorithm design philosophy. However, it suffers from the inconsistency issue between optimization and generalization, as achieving good generalization, guided by the bias-variance trade-off principle, favors under-parameterized networks, whereas ensuring effective convergence of gradient-based algorithms demands over-parameterized networks. To address this issue, we develop a novel sketching scheme based on deep net components for various tasks. Specifically, we use deep net components with specific efficacy to build a sketching basis that embodies the advantages of deep networks. Subsequently, we transform deep net training into a linear empirical risk minimization problem based on the constructed basis, successfully avoiding the complicated convergence analysis of iterative algorithms. The efficacy of the proposed component-based sketching is validated through both theoretical analysis and numerical experiments. Theoretically, we show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions for shallow nets and also achieves almost optimal generalization error bounds. Numerically, we demonstrate that, compared with the existing gradient-based training methods, component-based sketching possesses superior generalization performance with reduced training costs.
Paper Structure (17 sections, 14 theorems, 82 equations, 18 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 14 theorems, 82 equations, 18 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

Let $\sigma$ be the ReLU function, $t_k=-1/2+k/n$, and $T_{\tau,t_{k-1},t_k}$ be defined in Localized-identifier. Then,

Figures (18)

  • Figure 1: Tug of war between optimization and generalization
  • Figure 2: Flow of component-based training schemes
  • Figure 3: Flow of component-based sketching schemes
  • Figure 4: Cubic locality versus cone-type locality
  • Figure 5: Road-map of the component-based sketching schemes for deep nets
  • ...and 13 more figures

Theorems & Definitions (15)

  • Definition 1
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Lemma 1
  • Lemma 2
  • Proposition 4
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • ...and 5 more