HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion

Junhao Su; Chenghao He; Feiyu Zhu; Xiaojie Xu; Dongzhi Guan; Chenyang Si

HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion

Junhao Su, Chenghao He, Feiyu Zhu, Xiaojie Xu, Dongzhi Guan, Chenyang Si

TL;DR

This work tackles the memory and interaction limitations of end-to-end backpropagation by introducing HPFF, a framework that fuses Hierarchical Locally Supervised Learning (HiLo) with Patch Feature Fusion (PFF). HiLo employs a two-level local-module architecture—independent and cascade modules—with multiple gradient-isolated auxiliary networks that provide both local and inter-module supervision, promoting global feature integration. PFF further reduces memory usage by computing auxiliary-network supervision on small patches of features and averaging them, which preserves essential patterns across patches. Across CIFAR-10, STL-10, SVHN, and ImageNet, HPFF delivers state-of-the-art performance gains while substantially cutting GPU memory usage, demonstrating a versatile, plug-and-play approach to close the gap between local learning methods and end-to-end BP in practical settings.

Abstract

Traditional deep learning relies on end-to-end backpropagation for training, but it suffers from drawbacks such as high memory consumption and not aligning with biological neural networks. Recent advancements have introduced locally supervised learning, which divides networks into modules with isolated gradients and trains them locally. However, this approach can lead to performance lag due to limited interaction between these modules, and the design of auxiliary networks occupies a certain amount of GPU memory. To overcome these limitations, we propose a novel model called HPFF that performs hierarchical locally supervised learning and patch-level feature computation on the auxiliary networks. Hierarchical Locally Supervised Learning (HiLo) enables the network to learn features at different granularity levels along their respective local paths. Specifically, the network is divided into two-level local modules: independent local modules and cascade local modules. The cascade local modules combine two adjacent independent local modules, incorporating both updates within the modules themselves and information exchange between adjacent modules. Patch Feature Fusion (PFF) reduces GPU memory usage by splitting the input features of the auxiliary networks into patches for computation. By averaging these patch-level features, it enhances the network's ability to focus more on those patterns that are prevalent across multiple patches. Furthermore, our method exhibits strong generalization capabilities and can be seamlessly integrated with existing techniques. We conduct experiments on CIFAR-10, STL-10, SVHN, and ImageNet datasets, and the results demonstrate that our proposed HPFF significantly outperforms previous approaches, consistently achieving state-of-the-art performance across different datasets. Our code is available at: https://github.com/Zeudfish/HPFF.

HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion

TL;DR

Abstract

Paper Structure (14 sections, 7 equations, 6 figures, 5 tables)

This paper contains 14 sections, 7 equations, 6 figures, 5 tables.

Introduction
Related Work
Local Learning
Alternatives of backpropagation
Method
Preliminaries
Hierarchical Local Modules
Patch Feature Fusion for Auxiliary Network
Experiment
Experimental Setup
Implement Detail
Comparison with the SOTA results
Ablation Study
Conclusion

Figures (6)

Figure 1: Comparison between different methods with HPFF and the original methods in terms of Test Accuracy. Results are obtained using ResNet-110 (K=55) on the CIFAR-10, STL-10 and SVHN datasets. The * means addtion of our HPFF.
Figure 2: The HPFF overall architecture. Where (a) is the structure diagram of E2E training, (b) is the structure diagram of other supervised local learning, (c) is the structure diagram of HPFF. We divide the network into K local modules in the figure. IAN stands for Independent Auxiliary Network, while CAN refers to the Cascade Auxiliary Network.
Figure 3: (a) is t-SNE of Independent Level, (b) is t-SNE of Cascade Level, (c) t-SNE of HiLo. Visualizations are conducted using t-SNE on the CIFAR-10 dataset, with ResNet-32 (K=16) as the backbone. The target class is represented in blue, while the non-target class is represented in yellow.
Figure 4: Comparsion of layer-wise linear separability across different learning rules on ResNet-32 and ResNet-110.
Figure 5: Comparsion of layer-wise representation similarity. We utilize CKA kornblith2019similarity to mesure the layer-wise similarity of representation between BP and our methods.
...and 1 more figures

HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion

TL;DR

Abstract

HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (6)