Momentum Auxiliary Network for Supervised Local Learning

Junhao Su; Changpeng Cai; Feiyu Zhu; Chenghao He; Xiaojie Xu; Dongzhi Guan; Chenyang Si

Momentum Auxiliary Network for Supervised Local Learning

Junhao Su, Changpeng Cai, Feiyu Zhu, Chenghao He, Xiaojie Xu, Dongzhi Guan, Chenyang Si

TL;DR

A Momentum Auxiliary Network (MAN) is proposed that establishes a dynamic interaction mechanism and leverages an exponential moving average of the parameters from adjacent local blocks to enhance information flow, and can reduce GPU memory usage on the ImageNet dataset compared to end-to-end training, while achieving higher performance.

Abstract

Deep neural networks conventionally employ end-to-end backpropagation for their training process, which lacks biological credibility and triggers a locking dilemma during network parameter updates, leading to significant GPU memory use. Supervised local learning, which segments the network into multiple local blocks updated by independent auxiliary networks. However, these methods cannot replace end-to-end training due to lower accuracy, as gradients only propagate within their local block, creating a lack of information exchange between blocks. To address this issue and establish information transfer across blocks, we propose a Momentum Auxiliary Network (MAN) that establishes a dynamic interaction mechanism. The MAN leverages an exponential moving average (EMA) of the parameters from adjacent local blocks to enhance information flow. This auxiliary network, updated through EMA, helps bridge the informational gap between blocks. Nevertheless, we observe that directly applying EMA parameters has certain limitations due to feature discrepancies among local blocks. To overcome this, we introduce learnable biases, further boosting performance. We have validated our method on four image classification datasets (CIFAR-10, STL-10, SVHN, ImageNet), attaining superior performance and substantial memory savings. Notably, our method can reduce GPU memory usage by more than 45\% on the ImageNet dataset compared to end-to-end training, while achieving higher performance. The Momentum Auxiliary Network thus offers a new perspective for supervised local learning. Our code is available at: https://github.com/JunhaoSu0/MAN.

Momentum Auxiliary Network for Supervised Local Learning

TL;DR

Abstract

Paper Structure (16 sections, 7 equations, 7 figures, 5 tables)

This paper contains 16 sections, 7 equations, 7 figures, 5 tables.

Introduction
Related Work
Local Learning
Alternative Learning Rules to E2E Training
Method
Preliminaries
Momentum Auxiliary Network
Experiments
Experimental Setup
Implement Details
Results on Image Classification Datasets
Ablation Studies
The Effectiveness of EMA
Linear Separability Analysis
Representation Similarity Analysis
...and 1 more sections

Figures (7)

Figure 1: Comparison between different methods with MAN and the original methods in terms of accuracy. Results are obtained using ResNet-110 (K=55) on the various datasets.
Figure 2: Comparison of (a) end-to-end backpropagation, (b) other supervised local learning methods, and (c) our proposed method. Unlike E2E, supervised local learning separates the network into K gradient-isolated local blocks.
Figure 3: Details of the Momentum Auxiliary Network. Local (i+1) represents the (i+1)-th gradient-isolated local block, which contains layers from layer m to layer (m+n), totaling n+1 layers (n$\geqslant$0). We only use the parameters of the first layer to ensure a balance in GPU memory usage.
Figure 4: Training-Accuracy curves, the left uses ResNet-32 (K=16) as the backbone, while the right uses ResNet-110 (K=55). Both are utilizing the CIFAR-10 dataset.
Figure 5: Feature maps comparison of (a) Original Method without MAN, (b) MAN with only EMA, (c) MAN with only learnable bias and (d) MAN with both EMA and learnable bias. The feature map are obtained using ResNet-32 (K=16) on CIFAR-10.
...and 2 more figures

Momentum Auxiliary Network for Supervised Local Learning

TL;DR

Abstract

Momentum Auxiliary Network for Supervised Local Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)