Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning
Dingwen Zhang, Yan Li, De Cheng, Nannan Wang, Junwei Han
TL;DR
This work tackles on-device incremental learning under tight compute and memory constraints, addressing catastrophic forgetting. Through an empirical study, it reveals that center kernel elements carry higher knowledge intensity for learning new tasks, especially in deeper layers. It then introduces Center-Sensitive Kernel Optimization (CsKO) and Dynamic Channel Element Selection (DCES), decoupling center kernels into a 1×1 branch and learning only the most sensitive channels via sparse orthogonal gradient projections. The approach yields strong incremental performance while dramatically reducing memory and computational overhead compared with both low-cost and conventional incremental methods, making it well-suited for edge devices in dynamic environments. Overall, CsKO with DCES offers a practical pathway to robust, efficient on-device learning, with potential extensions to broader architectures and tasks.
Abstract
To facilitate the evolution of edge intelligence in ever-changing environments, we study on-device incremental learning constrained in limited computation resource in this paper. Current on-device training methods just focus on efficient training without considering the catastrophic forgetting, preventing the model getting stronger when continually exploring the world. To solve this problem, a direct solution is to involve the existing incremental learning mechanisms into the on-device training framework. Unfortunately, such a manner cannot work well as those mechanisms usually introduce large additional computational cost to the network optimization process, which would inevitably exceed the memory capacity of the edge devices. To address this issue, this paper makes an early effort to propose a simple but effective edge-friendly incremental learning framework. Based on an empirical study on the knowledge intensity of the kernel elements of the neural network, we find that the center kernel is the key for maximizing the knowledge intensity for learning new data, while freezing the other kernel elements would get a good balance on the model's capacity for overcoming catastrophic forgetting. Upon this finding, we further design a center-sensitive kernel optimization framework to largely alleviate the cost of the gradient computation and back-propagation. Besides, a dynamic channel element selection strategy is also proposed to facilitate a sparse orthogonal gradient projection for further reducing the optimization complexity, upon the knowledge explored from the new task data. Extensive experiments validate our method is efficient and effective, e.g., our method achieves average accuracy boost of 38.08% with even less memory and approximate computation compared to existing on-device training methods, indicating its significant potential for on-device incremental learning.
