Table of Contents
Fetching ...

MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network

Yuming Zhang, Shouxin Zhang, Peizhe Wang, Feiyu Zhu, Dongzhi Guan, Junhao Su, Jiabin Liu, Changpeng Cai

TL;DR

End-to-end backpropagation imposes high memory and efficiency costs, motivating a local-learning approach. MLAAN introduces two synergistic components: Multilaminar Local Modules (MLM) to capture local and global features via independent and cascaded auxiliary networks, and Leap Augmented Modules (LAM) that use Exponential Moving Average to exchange information from deeper layers, mitigating myopia. The method substantially improves accuracy on CIFAR-10, STL-10, SVHN, and ImageNet benchmarks, while reducing GPU memory usage and outperforming several E2E baselines in some settings. This plug-and-play framework enhances global feature learning within gradient-isolated modules, enabling more scalable and memory-efficient supervised local learning for large models.

Abstract

Deep neural networks (DNNs) typically employ an end-to-end (E2E) training paradigm which presents several challenges, including high GPU memory consumption, inefficiency, and difficulties in model parallelization during training. Recent research has sought to address these issues, with one promising approach being local learning. This method involves partitioning the backbone network into gradient-isolated modules and manually designing auxiliary networks to train these local modules. Existing methods often neglect the interaction of information between local modules, leading to myopic issues and a performance gap compared to E2E training. To address these limitations, we propose the Multilaminar Leap Augmented Auxiliary Network (MLAAN). Specifically, MLAAN comprises Multilaminar Local Modules (MLM) and Leap Augmented Modules (LAM). MLM captures both local and global features through independent and cascaded auxiliary networks, alleviating performance issues caused by insufficient global features. However, overly simplistic auxiliary networks can impede MLM's ability to capture global information. To address this, we further design LAM, an enhanced auxiliary network that uses the Exponential Moving Average (EMA) method to facilitate information exchange between local modules, thereby mitigating the shortsightedness resulting from inadequate interaction. The synergy between MLM and LAM has demonstrated excellent performance. Our experiments on the CIFAR-10, STL-10, SVHN, and ImageNet datasets show that MLAAN can be seamlessly integrated into existing local learning frameworks, significantly enhancing their performance and even surpassing end-to-end (E2E) training methods, while also reducing GPU memory consumption.

MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network

TL;DR

End-to-end backpropagation imposes high memory and efficiency costs, motivating a local-learning approach. MLAAN introduces two synergistic components: Multilaminar Local Modules (MLM) to capture local and global features via independent and cascaded auxiliary networks, and Leap Augmented Modules (LAM) that use Exponential Moving Average to exchange information from deeper layers, mitigating myopia. The method substantially improves accuracy on CIFAR-10, STL-10, SVHN, and ImageNet benchmarks, while reducing GPU memory usage and outperforming several E2E baselines in some settings. This plug-and-play framework enhances global feature learning within gradient-isolated modules, enabling more scalable and memory-efficient supervised local learning for large models.

Abstract

Deep neural networks (DNNs) typically employ an end-to-end (E2E) training paradigm which presents several challenges, including high GPU memory consumption, inefficiency, and difficulties in model parallelization during training. Recent research has sought to address these issues, with one promising approach being local learning. This method involves partitioning the backbone network into gradient-isolated modules and manually designing auxiliary networks to train these local modules. Existing methods often neglect the interaction of information between local modules, leading to myopic issues and a performance gap compared to E2E training. To address these limitations, we propose the Multilaminar Leap Augmented Auxiliary Network (MLAAN). Specifically, MLAAN comprises Multilaminar Local Modules (MLM) and Leap Augmented Modules (LAM). MLM captures both local and global features through independent and cascaded auxiliary networks, alleviating performance issues caused by insufficient global features. However, overly simplistic auxiliary networks can impede MLM's ability to capture global information. To address this, we further design LAM, an enhanced auxiliary network that uses the Exponential Moving Average (EMA) method to facilitate information exchange between local modules, thereby mitigating the shortsightedness resulting from inadequate interaction. The synergy between MLM and LAM has demonstrated excellent performance. Our experiments on the CIFAR-10, STL-10, SVHN, and ImageNet datasets show that MLAAN can be seamlessly integrated into existing local learning frameworks, significantly enhancing their performance and even surpassing end-to-end (E2E) training methods, while also reducing GPU memory consumption.

Paper Structure

This paper contains 25 sections, 12 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Comparison between different methods with MLAAN and BP regarding GPU Memory, Test Error, and Top5 Error. Results are obtained using ResNet-32 and ResNet-110 on CIFAR-10, ResNet-34, ResNet-101, and ResNet-152 on ImageNet.
  • Figure 2: Comparison of (a) E2E backpropagation, (b) other supervised local learning methods, and (c) our proposed method. The details of our method are in Figure \ref{['fig:3']}.
  • Figure 3: The Leap Augmented Modules architecture. As the proximity to the early blocks increases, the utilization of auxiliary layers employing EMA decreases.
  • Figure 4: Visualization of feature maps. (a) Feature map of DGL. (b) Feature map of DGL with LAM. (c) Feature map of DGL with MLM. (d) Feature map of DGL with MLAAN.
  • Figure 5: Comparison of layer-wise linear separability. (a) Linear Separability of RestNet-32 on CIFAR-10. (b) Linear Separability of RestNet-110 on CIFAR-10.
  • ...and 1 more figures