Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training

Zeliang Zhang; Jinyang Jiang; Zhuo Liu; Susan Liang; Yijie Peng; Chenliang Xu

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training

Zeliang Zhang, Jinyang Jiang, Zhuo Liu, Susan Liang, Yijie Peng, Chenliang Xu

TL;DR

This work targets the memory and compute bottlenecks of likelihood ratio (LR) gradient estimation as an alternative to backpropagation. It introduces an approximated LR method that uses sign encoding for the dominant LR term, yielding a surrogate ascent direction $\tilde{g}_k$ with significantly lower memory demands while preserving convergence guarantees. The authors prove convergence to a unique optimum under standard assumptions and propose data- and layer-level parallelism plus a forward-only hardware-efficient training pipeline to accelerate training. Empirically, the approximated LR approach shows competitive accuracy across CIFAR-100, Tiny-ImageNet, and diverse architectures, while delivering substantial memory reductions and runtime speedups, highlighting the practical potential of LR-based training as a scalable, biologically plausible alternative to BP.

Abstract

Efficient and biologically plausible alternatives to backpropagation in neural network training remain a challenge due to issues such as high computational complexity and additional assumptions about neural networks, which limit scalability to deeper networks. The likelihood ratio method offers a promising gradient estimation strategy but is constrained by significant memory consumption, especially when deploying multiple copies of data to reduce estimation variance. In this paper, we introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation. By exploiting the natural parallelism during the backward pass using LR, we further provide a high-performance training strategy, which pipelines both the forward and backward pass, to make it more suitable for the computation on specialized hardware. Extensive experiments demonstrate the effectiveness of the approximation technique in neural network training. This work underscores the potential of the likelihood ratio method in achieving high-performance neural network training, suggesting avenues for further exploration.

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training

TL;DR

with significantly lower memory demands while preserving convergence guarantees. The authors prove convergence to a unique optimum under standard assumptions and propose data- and layer-level parallelism plus a forward-only hardware-efficient training pipeline to accelerate training. Empirically, the approximated LR approach shows competitive accuracy across CIFAR-100, Tiny-ImageNet, and diverse architectures, while delivering substantial memory reductions and runtime speedups, highlighting the practical potential of LR-based training as a scalable, biologically plausible alternative to BP.

Abstract

Paper Structure (22 sections, 3 theorems, 17 equations, 6 figures, 4 tables)

This paper contains 22 sections, 3 theorems, 17 equations, 6 figures, 4 tables.

Introduction
LR Method for DNN Training
Approximated LR Method
Approximation Through Sign Encoding
Convergence Analysis
Parallel Analysis and Implementations
Data-level parallel.
Layer-level parallel.
Hardware-efficient LR Training Pipeline
Evaluations
Verification Study on the Approximation
Performance Evaluation on Larger Datasets
Evaluation on CIFAR-100 dataset.
Evaluation on Tiny-ImageNet dataset.
Generalization to more architectures.
...and 7 more sections

Key Result

Lemma 1

If Assumptions a1 and a2 hold, then $\tilde{\omega}^*$ is the unique global asymptotically stable equilibrium of ODE (proj_ode).

Figures (6)

Figure 1: Training ResNet-9 on the CIFAR-100 dataset using LR, ES, and corresponding approximated methods, ALR and AES.
Figure 2: The design of hardware-efficient LR training, which pipelines both forward and gradient computation process.
Figure 3: Learning curves of ResNet-5 on CIFAR-10.
Figure 4: Running efficiency of the LR training with integration of approximation (A-) and pipeline (-P). The black line indicates the number of iterations processed per second using BP.
Figure 5: From left to right, gradient cosine similarities between Hybrid and BP, A-Hybrid and BP.
...and 1 more figures

Theorems & Definitions (5)

Lemma 1
proof
Theorem 1
Theorem 2
proof

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training

TL;DR

Abstract

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (5)