Table of Contents
Fetching ...

Replacement Learning: Training Vision Tasks with Fewer Learnable Parameters

Yuming Zhang, Peizhe Wang, Shouxin Zhang, Dongzhi Guan, Jiabin Liu, Junhao Su

TL;DR

Experimental results demonstrate that the proposed Replacement Learning approach reduces the number of parameters, training time, and memory consumption while completely surpassing the performance of end-to-end training.

Abstract

Traditional end-to-end deep learning models often enhance feature representation and overall performance by increasing the depth and complexity of the network during training. However, this approach inevitably introduces issues of parameter redundancy and resource inefficiency, especially in deeper networks. While existing works attempt to skip certain redundant layers to alleviate these problems, challenges related to poor performance, computational complexity, and inefficient memory usage remain. To address these issues, we propose an innovative training approach called Replacement Learning, which mitigates these limitations by completely replacing all the parameters of the frozen layers with only two learnable parameters. Specifically, Replacement Learning selectively freezes the parameters of certain layers, and the frozen layers utilize parameters from adjacent layers, updating them through a parameter integration mechanism controlled by two learnable parameters. This method leverages information from surrounding structures, reduces computation, conserves GPU memory, and maintains a balance between historical context and new inputs, ultimately enhancing overall model performance. We conducted experiments across four benchmark datasets, including CIFAR-10, STL-10, SVHN, and ImageNet, utilizing various architectures such as CNNs and ViTs to validate the effectiveness of Replacement Learning. Experimental results demonstrate that our approach reduces the number of parameters, training time, and memory consumption while completely surpassing the performance of end-to-end training.

Replacement Learning: Training Vision Tasks with Fewer Learnable Parameters

TL;DR

Experimental results demonstrate that the proposed Replacement Learning approach reduces the number of parameters, training time, and memory consumption while completely surpassing the performance of end-to-end training.

Abstract

Traditional end-to-end deep learning models often enhance feature representation and overall performance by increasing the depth and complexity of the network during training. However, this approach inevitably introduces issues of parameter redundancy and resource inefficiency, especially in deeper networks. While existing works attempt to skip certain redundant layers to alleviate these problems, challenges related to poor performance, computational complexity, and inefficient memory usage remain. To address these issues, we propose an innovative training approach called Replacement Learning, which mitigates these limitations by completely replacing all the parameters of the frozen layers with only two learnable parameters. Specifically, Replacement Learning selectively freezes the parameters of certain layers, and the frozen layers utilize parameters from adjacent layers, updating them through a parameter integration mechanism controlled by two learnable parameters. This method leverages information from surrounding structures, reduces computation, conserves GPU memory, and maintains a balance between historical context and new inputs, ultimately enhancing overall model performance. We conducted experiments across four benchmark datasets, including CIFAR-10, STL-10, SVHN, and ImageNet, utilizing various architectures such as CNNs and ViTs to validate the effectiveness of Replacement Learning. Experimental results demonstrate that our approach reduces the number of parameters, training time, and memory consumption while completely surpassing the performance of end-to-end training.
Paper Structure (22 sections, 19 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 19 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison between different backbones with the training of Replacement Learning and end-to-end training regarding GPU Memory and Accuracy.
  • Figure 2: Comparison of (a) end-to-end backpropagation and (b) our proposed Replacement Learning.
  • Figure 3: Visualization of feature maps. (a) Feature map of end-to-end training. (b) Feature map of training with parameters updated by the preceding layer. (c) Feature map of training with parameters updated by the succeeding layer. (d) Feature map of Replacement Learning, parameters updated by both the preceding and succeeding layers.
  • Figure 4: Visualization of similarity matrixes. (a) Similarity matrix of end-to-end training. (b) Similarity matrix of Replacement Learning.
  • Figure 5: Comparison of layer-wise linear separability. (a) Linear Separability of RestNet-32 on CIFAR-10. (b) Linear Separability of RestNet-110 on CIFAR-10.