HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
Junhao Su, Chenghao He, Feiyu Zhu, Xiaojie Xu, Dongzhi Guan, Chenyang Si
TL;DR
This work tackles the memory and interaction limitations of end-to-end backpropagation by introducing HPFF, a framework that fuses Hierarchical Locally Supervised Learning (HiLo) with Patch Feature Fusion (PFF). HiLo employs a two-level local-module architecture—independent and cascade modules—with multiple gradient-isolated auxiliary networks that provide both local and inter-module supervision, promoting global feature integration. PFF further reduces memory usage by computing auxiliary-network supervision on small patches of features and averaging them, which preserves essential patterns across patches. Across CIFAR-10, STL-10, SVHN, and ImageNet, HPFF delivers state-of-the-art performance gains while substantially cutting GPU memory usage, demonstrating a versatile, plug-and-play approach to close the gap between local learning methods and end-to-end BP in practical settings.
Abstract
Traditional deep learning relies on end-to-end backpropagation for training, but it suffers from drawbacks such as high memory consumption and not aligning with biological neural networks. Recent advancements have introduced locally supervised learning, which divides networks into modules with isolated gradients and trains them locally. However, this approach can lead to performance lag due to limited interaction between these modules, and the design of auxiliary networks occupies a certain amount of GPU memory. To overcome these limitations, we propose a novel model called HPFF that performs hierarchical locally supervised learning and patch-level feature computation on the auxiliary networks. Hierarchical Locally Supervised Learning (HiLo) enables the network to learn features at different granularity levels along their respective local paths. Specifically, the network is divided into two-level local modules: independent local modules and cascade local modules. The cascade local modules combine two adjacent independent local modules, incorporating both updates within the modules themselves and information exchange between adjacent modules. Patch Feature Fusion (PFF) reduces GPU memory usage by splitting the input features of the auxiliary networks into patches for computation. By averaging these patch-level features, it enhances the network's ability to focus more on those patterns that are prevalent across multiple patches. Furthermore, our method exhibits strong generalization capabilities and can be seamlessly integrated with existing techniques. We conduct experiments on CIFAR-10, STL-10, SVHN, and ImageNet datasets, and the results demonstrate that our proposed HPFF significantly outperforms previous approaches, consistently achieving state-of-the-art performance across different datasets. Our code is available at: https://github.com/Zeudfish/HPFF.
