Orthogonal Weight Modification Enhances Learning Scalability and Convergence Efficiency without Gradient Backpropagation

Guoqing Ma; Shan Yu

Orthogonal Weight Modification Enhances Learning Scalability and Convergence Efficiency without Gradient Backpropagation

Guoqing Ma, Shan Yu

TL;DR

This work finds that low-rank is an inherent property of perturbation-based algorithms, and proposes a perturbation-based approach called LOw-rank Cluster Orthogonal (LOCO) weight modification that demonstrates the capability to locally train the deepest spiking neural networks to date.

Abstract

Recognizing the substantial computational cost of backpropagation (BP), non-BP methods have emerged as attractive alternatives for efficient learning on emerging neuromorphic systems. However, existing non-BP approaches still face critical challenges in efficiency and scalability. Inspired by neural representations and dynamic mechanisms in the brain, we propose a perturbation-based approach called LOw-rank Cluster Orthogonal (LOCO) weight modification. We find that low-rank is an inherent property of perturbation-based algorithms. Under this condition, the orthogonality constraint limits the variance of the node perturbation (NP) gradient estimates and enhances the convergence efficiency. Through extensive evaluations on multiple datasets, LOCO demonstrates the capability to locally train the deepest spiking neural networks to date (more than 10 layers), while exhibiting strong continual learning ability, improved convergence efficiency, and better task performance compared to other brain-inspired non-BP algorithms. Notably, LOCO requires only O(1) parallel time complexity for weight updates, which is significantly lower than that of BP methods. This offers a promising direction for achieving high-performance, real-time, and lifelong learning on neuromorphic systems.

Orthogonal Weight Modification Enhances Learning Scalability and Convergence Efficiency without Gradient Backpropagation

TL;DR

Abstract

Paper Structure (11 sections, 7 equations, 4 figures)

This paper contains 11 sections, 7 equations, 4 figures.

Introduction
Related work
Method
Pipeline Overview
Low-rank Cluster Orthogonal Weight Modification (LOCO)
Calculation of projection matrix
Experiments
Experimental Setup
Results and Analyses
Conclusion
Acknowledgment

Figures (4)

Figure 1: Schematic diagram of LOCO. (A) The multi-layer architecture of SNN. The weights connecting to the same postsynaptic neurons are marked with the same color, representing the unit of weight modification in LOCO (Eq. \ref{['main-1']}). (B) The weight modification with NP and LOCO. (C) During the training of new tasks, the NP weight modification $\Delta {{\rm{W}}^{NP}}$, is projected to the subspace (green surface), in which good performance for old tasks can be maintained. As a result, the actually implemented weight modification is $\Delta {{\rm{W}}^{LOCO}}$. (D) The process of calculating the projection matrix $P_l$. (E) The reduced variation during the parameter searching process between NP and LOCO. NP searches in a high-dimensional space (orange line) with high variance. LOCO reduces the variance and searches in a low-dimensional subspace (green planes).
Figure 2: LOCO demonstrates superior scalability, enhanced convergence efficiency, and continual learning capabilities without gradient backpropagation.(A to C) Performance on the MNIST dataset. (A) LOCO can train a deeper spiking neural network than STDP+SBP and NP. Performance of network with differrent hidden layers are color-coded. (B) Learning dynamics of LOCO for neural networks ranging from 3 to 10 layers. The inset shows the accuracy curve of NP. (C) Accuracy curves for continual learning task on MNIST. The horizontal axis represents the current class being learned, with each class trained sequentially. The vertical axis indicates the average classification accuracy on all previously learned classes. (D to F) Performance in the phonetic transcription task, presented in the same manner as that in (A) and (B). The shaded regions and error bars represent the variance of the performance across five runs with different seeds.
Figure 3: Low-rank is an inherent property of perturbation-based algorithm.
Figure 4: The magnitude of weight changes in LOCO is smaller than that of NP. This implies that LOCO is more stable and energy-efficient.

Orthogonal Weight Modification Enhances Learning Scalability and Convergence Efficiency without Gradient Backpropagation

TL;DR

Abstract

Orthogonal Weight Modification Enhances Learning Scalability and Convergence Efficiency without Gradient Backpropagation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)