Table of Contents
Fetching ...

Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning

Dingwen Zhang, Yan Li, De Cheng, Nannan Wang, Junwei Han

TL;DR

This work tackles on-device incremental learning under tight compute and memory constraints, addressing catastrophic forgetting. Through an empirical study, it reveals that center kernel elements carry higher knowledge intensity for learning new tasks, especially in deeper layers. It then introduces Center-Sensitive Kernel Optimization (CsKO) and Dynamic Channel Element Selection (DCES), decoupling center kernels into a 1×1 branch and learning only the most sensitive channels via sparse orthogonal gradient projections. The approach yields strong incremental performance while dramatically reducing memory and computational overhead compared with both low-cost and conventional incremental methods, making it well-suited for edge devices in dynamic environments. Overall, CsKO with DCES offers a practical pathway to robust, efficient on-device learning, with potential extensions to broader architectures and tasks.

Abstract

To facilitate the evolution of edge intelligence in ever-changing environments, we study on-device incremental learning constrained in limited computation resource in this paper. Current on-device training methods just focus on efficient training without considering the catastrophic forgetting, preventing the model getting stronger when continually exploring the world. To solve this problem, a direct solution is to involve the existing incremental learning mechanisms into the on-device training framework. Unfortunately, such a manner cannot work well as those mechanisms usually introduce large additional computational cost to the network optimization process, which would inevitably exceed the memory capacity of the edge devices. To address this issue, this paper makes an early effort to propose a simple but effective edge-friendly incremental learning framework. Based on an empirical study on the knowledge intensity of the kernel elements of the neural network, we find that the center kernel is the key for maximizing the knowledge intensity for learning new data, while freezing the other kernel elements would get a good balance on the model's capacity for overcoming catastrophic forgetting. Upon this finding, we further design a center-sensitive kernel optimization framework to largely alleviate the cost of the gradient computation and back-propagation. Besides, a dynamic channel element selection strategy is also proposed to facilitate a sparse orthogonal gradient projection for further reducing the optimization complexity, upon the knowledge explored from the new task data. Extensive experiments validate our method is efficient and effective, e.g., our method achieves average accuracy boost of 38.08% with even less memory and approximate computation compared to existing on-device training methods, indicating its significant potential for on-device incremental learning.

Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning

TL;DR

This work tackles on-device incremental learning under tight compute and memory constraints, addressing catastrophic forgetting. Through an empirical study, it reveals that center kernel elements carry higher knowledge intensity for learning new tasks, especially in deeper layers. It then introduces Center-Sensitive Kernel Optimization (CsKO) and Dynamic Channel Element Selection (DCES), decoupling center kernels into a 1×1 branch and learning only the most sensitive channels via sparse orthogonal gradient projections. The approach yields strong incremental performance while dramatically reducing memory and computational overhead compared with both low-cost and conventional incremental methods, making it well-suited for edge devices in dynamic environments. Overall, CsKO with DCES offers a practical pathway to robust, efficient on-device learning, with potential extensions to broader architectures and tasks.

Abstract

To facilitate the evolution of edge intelligence in ever-changing environments, we study on-device incremental learning constrained in limited computation resource in this paper. Current on-device training methods just focus on efficient training without considering the catastrophic forgetting, preventing the model getting stronger when continually exploring the world. To solve this problem, a direct solution is to involve the existing incremental learning mechanisms into the on-device training framework. Unfortunately, such a manner cannot work well as those mechanisms usually introduce large additional computational cost to the network optimization process, which would inevitably exceed the memory capacity of the edge devices. To address this issue, this paper makes an early effort to propose a simple but effective edge-friendly incremental learning framework. Based on an empirical study on the knowledge intensity of the kernel elements of the neural network, we find that the center kernel is the key for maximizing the knowledge intensity for learning new data, while freezing the other kernel elements would get a good balance on the model's capacity for overcoming catastrophic forgetting. Upon this finding, we further design a center-sensitive kernel optimization framework to largely alleviate the cost of the gradient computation and back-propagation. Besides, a dynamic channel element selection strategy is also proposed to facilitate a sparse orthogonal gradient projection for further reducing the optimization complexity, upon the knowledge explored from the new task data. Extensive experiments validate our method is efficient and effective, e.g., our method achieves average accuracy boost of 38.08% with even less memory and approximate computation compared to existing on-device training methods, indicating its significant potential for on-device incremental learning.
Paper Structure (29 sections, 8 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 29 sections, 8 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Our method demonstrates superior overall performance with lower computational cost (FLOPs), lower memory usage, fewer trainable parameters, shorter training time and relatively higher classification accuracy (i.e., lower classification error rate) compared with existing conventional and low-cost incremental learning methods.
  • Figure 2: Results of sensitivity-induced (top) and amplitude-induced (bottom) knowledge intensity analysis using pre-trained ResNet-18 he2016deep on the TinyImageNet le2015tiny dataset. Left: The sensitivity and amplitude of the $9$ positions in the $3 \times 3$ convolution kernel across all blocks within the network. Right: A detailed display of the sensitivity and amplitude across all layers in the last block.
  • Figure 3: Illustration of the proposed method workflow. The central elements in the trainable layers are decoupled from the original convolution kernels into new $1 \times 1$ kernels placed on the side of the main network, which undergo independent gradient computation and back-propagation, significantly reducing training resource overhead. Then, the dynamic channel element selection strategy selects a proportion $s$ of central element channels that are more sensitive to the incoming new data, further alleviating the optimization burden. Besides, facilitated by the dynamic channel element selection, an efficient sparse orthogonal gradient projection is introduced to constrain parameter updates, effectively mitigating catastrophic forgetting.
  • Figure 4: Visualization of the channel sensitivity analysis results for a trainable convolutional layer on CIFAR-100 dataset (left) and TinyImageNet (right) dataset. The result suggests that different channels of the convolutional layer exhibit varying levels of sensitivity to different data.
  • Figure 5: Performance under the different number of last trainable layers on TinyImageNet dataset.
  • ...and 5 more figures