Table of Contents
Fetching ...

Dynamic Spectral Backpropagation for Efficient Neural Network Training

Mannmohan Muthuraman

TL;DR

DSBP addresses training efficiency under limited data and compute by projecting layerwise gradients onto the top-$k$ eigenvectors of covariances, reducing per-layer cost to $O(k d_l)$ and biasing updates toward flatter minima. It introduces five extensions—dynamic spectral inference, spectral architecture optimization, spectral meta learning, spectral transfer regularization, and Lie algebra inspired dynamics—grounded by a third-order stochastic differential equation and a PAC-Bayes generalization bound. Empirical results on CIFAR-10, Fashion-MNIST, MedMNIST, and Tiny ImageNet show DSBP consistently outperforms SAM, LoRA, and MAML in accuracy and training efficiency, with ablations highlighting the importance of $k$, $p$, and pruning. The work offers a scalable, theoretically grounded framework for robust, efficient training and points to future directions in scalability, fairness, robotics, and ethical deployment.

Abstract

Dynamic Spectral Backpropagation (DSBP) enhances neural network training under resource constraints by projecting gradients onto principal eigenvectors, reducing complexity and promoting flat minima. Five extensions are proposed, dynamic spectral inference, spectral architecture optimization, spectral meta learning, spectral transfer regularization, and Lie algebra inspired dynamics, to address challenges in robustness, fewshot learning, and hardware efficiency. Supported by a third order stochastic differential equation (SDE) and a PAC Bayes limit, DSBP outperforms Sharpness Aware Minimization (SAM), Low Rank Adaptation (LoRA), and Model Agnostic Meta Learning (MAML) on CIFAR 10, Fashion MNIST, MedMNIST, and Tiny ImageNet, as demonstrated through extensive experiments and visualizations. Future work focuses on scalability, bias mitigation, and ethical considerations.

Dynamic Spectral Backpropagation for Efficient Neural Network Training

TL;DR

DSBP addresses training efficiency under limited data and compute by projecting layerwise gradients onto the top- eigenvectors of covariances, reducing per-layer cost to and biasing updates toward flatter minima. It introduces five extensions—dynamic spectral inference, spectral architecture optimization, spectral meta learning, spectral transfer regularization, and Lie algebra inspired dynamics—grounded by a third-order stochastic differential equation and a PAC-Bayes generalization bound. Empirical results on CIFAR-10, Fashion-MNIST, MedMNIST, and Tiny ImageNet show DSBP consistently outperforms SAM, LoRA, and MAML in accuracy and training efficiency, with ablations highlighting the importance of , , and pruning. The work offers a scalable, theoretically grounded framework for robust, efficient training and points to future directions in scalability, fairness, robotics, and ethical deployment.

Abstract

Dynamic Spectral Backpropagation (DSBP) enhances neural network training under resource constraints by projecting gradients onto principal eigenvectors, reducing complexity and promoting flat minima. Five extensions are proposed, dynamic spectral inference, spectral architecture optimization, spectral meta learning, spectral transfer regularization, and Lie algebra inspired dynamics, to address challenges in robustness, fewshot learning, and hardware efficiency. Supported by a third order stochastic differential equation (SDE) and a PAC Bayes limit, DSBP outperforms Sharpness Aware Minimization (SAM), Low Rank Adaptation (LoRA), and Model Agnostic Meta Learning (MAML) on CIFAR 10, Fashion MNIST, MedMNIST, and Tiny ImageNet, as demonstrated through extensive experiments and visualizations. Future work focuses on scalability, bias mitigation, and ethical considerations.

Paper Structure

This paper contains 21 sections, 2 theorems, 44 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Under Lipschitz gradients and bounded third derivatives, the SDE is an order 1 weak approximation, error $\mathcal{O}(\eta)$.

Figures (4)

  • Figure 1: Gradient alignment and eigenvalue trends over training epochs. (a) Gradient eigenvector alignment (unitless) vs. epochs. (b) Top Hessian eigenvalue (unitless) vs. epochs.
  • Figure 2: Tensor Stratification: A 40x40x40 activation tensor before (blue) and after (red) DSBP projection, showing spatial coordinates (X, Y, Z).
  • Figure 3: Loss landscape and spectral variance. (a) Loss landscape slice showing loss (unitless) vs. principal weight directions (unitless). (b) Eigenvalue variance (unitless) across layer indices.
  • Figure 4: Perturbation Dynamics

Theorems & Definitions (2)

  • Proposition 1: Order 1 Approximation
  • Theorem 1: Generalization