Table of Contents
Fetching ...

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training

Zeliang Zhang, Jinyang Jiang, Zhuo Liu, Susan Liang, Yijie Peng, Chenliang Xu

TL;DR

This work targets the memory and compute bottlenecks of likelihood ratio (LR) gradient estimation as an alternative to backpropagation. It introduces an approximated LR method that uses sign encoding for the dominant LR term, yielding a surrogate ascent direction $\tilde{g}_k$ with significantly lower memory demands while preserving convergence guarantees. The authors prove convergence to a unique optimum under standard assumptions and propose data- and layer-level parallelism plus a forward-only hardware-efficient training pipeline to accelerate training. Empirically, the approximated LR approach shows competitive accuracy across CIFAR-100, Tiny-ImageNet, and diverse architectures, while delivering substantial memory reductions and runtime speedups, highlighting the practical potential of LR-based training as a scalable, biologically plausible alternative to BP.

Abstract

Efficient and biologically plausible alternatives to backpropagation in neural network training remain a challenge due to issues such as high computational complexity and additional assumptions about neural networks, which limit scalability to deeper networks. The likelihood ratio method offers a promising gradient estimation strategy but is constrained by significant memory consumption, especially when deploying multiple copies of data to reduce estimation variance. In this paper, we introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation. By exploiting the natural parallelism during the backward pass using LR, we further provide a high-performance training strategy, which pipelines both the forward and backward pass, to make it more suitable for the computation on specialized hardware. Extensive experiments demonstrate the effectiveness of the approximation technique in neural network training. This work underscores the potential of the likelihood ratio method in achieving high-performance neural network training, suggesting avenues for further exploration.

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training

TL;DR

This work targets the memory and compute bottlenecks of likelihood ratio (LR) gradient estimation as an alternative to backpropagation. It introduces an approximated LR method that uses sign encoding for the dominant LR term, yielding a surrogate ascent direction with significantly lower memory demands while preserving convergence guarantees. The authors prove convergence to a unique optimum under standard assumptions and propose data- and layer-level parallelism plus a forward-only hardware-efficient training pipeline to accelerate training. Empirically, the approximated LR approach shows competitive accuracy across CIFAR-100, Tiny-ImageNet, and diverse architectures, while delivering substantial memory reductions and runtime speedups, highlighting the practical potential of LR-based training as a scalable, biologically plausible alternative to BP.

Abstract

Efficient and biologically plausible alternatives to backpropagation in neural network training remain a challenge due to issues such as high computational complexity and additional assumptions about neural networks, which limit scalability to deeper networks. The likelihood ratio method offers a promising gradient estimation strategy but is constrained by significant memory consumption, especially when deploying multiple copies of data to reduce estimation variance. In this paper, we introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation. By exploiting the natural parallelism during the backward pass using LR, we further provide a high-performance training strategy, which pipelines both the forward and backward pass, to make it more suitable for the computation on specialized hardware. Extensive experiments demonstrate the effectiveness of the approximation technique in neural network training. This work underscores the potential of the likelihood ratio method in achieving high-performance neural network training, suggesting avenues for further exploration.
Paper Structure (22 sections, 3 theorems, 17 equations, 6 figures, 4 tables)

This paper contains 22 sections, 3 theorems, 17 equations, 6 figures, 4 tables.

Key Result

Lemma 1

If Assumptions a1 and a2 hold, then $\tilde{\omega}^*$ is the unique global asymptotically stable equilibrium of ODE (proj_ode).

Figures (6)

  • Figure 1: Training ResNet-9 on the CIFAR-100 dataset using LR, ES, and corresponding approximated methods, ALR and AES.
  • Figure 2: The design of hardware-efficient LR training, which pipelines both forward and gradient computation process.
  • Figure 3: Learning curves of ResNet-5 on CIFAR-10.
  • Figure 4: Running efficiency of the LR training with integration of approximation (A-) and pipeline (-P). The black line indicates the number of iterations processed per second using BP.
  • Figure 5: From left to right, gradient cosine similarities between Hybrid and BP, A-Hybrid and BP.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Lemma 1
  • proof
  • Theorem 1
  • Theorem 2
  • proof