Table of Contents
Fetching ...

LightCL: Compact Continual Learning with Low Memory Footprint For Edge Device

Zeqing Wang, Fei Cheng, Kangye Ji, Bohu Huang

TL;DR

LightCL tackles edge-device continual learning by measuring generalizability across network layers with two metrics, learning plasticity ($LP$) and memory stability ($MS$). It then maintains performance by freezing generalizable lower/middle layers (Maintain Generalizability) and memorizes past task feature patterns via a lightweight feature-map regulation (Memorize Feature Patterns). The approach achieves substantial memory-footprint reductions (up to about $6\times$) while maintaining or improving accuracy, and is validated on edge hardware. This work enables practical, on-device CL for resource-constrained environments.

Abstract

Continual learning (CL) is a technique that enables neural networks to constantly adapt to their dynamic surroundings. Despite being overlooked for a long time, this technology can considerably address the customized needs of users in edge devices. Actually, most CL methods require huge resource consumption by the training behavior to acquire generalizability among all tasks for delaying forgetting regardless of edge scenarios. Therefore, this paper proposes a compact algorithm called LightCL, which evaluates and compresses the redundancy of already generalized components in structures of the neural network. Specifically, we consider two factors of generalizability, learning plasticity and memory stability, and design metrics of both to quantitatively assess generalizability of neural networks during CL. This evaluation shows that generalizability of different layers in a neural network exhibits a significant variation. Thus, we $\textit{Maintain Generalizability}$ by freezing generalized parts without the resource-intensive training process and $\textit{Memorize Feature Patterns}$ by stabilizing feature extracting of previous tasks to enhance generalizability for less-generalized parts with a little extra memory, which is far less than the reduction by freezing. Experiments illustrate that LightCL outperforms other state-of-the-art methods and reduces at most $\textbf{6.16$\times$}$ memory footprint. We also verify the effectiveness of LightCL on the edge device.

LightCL: Compact Continual Learning with Low Memory Footprint For Edge Device

TL;DR

LightCL tackles edge-device continual learning by measuring generalizability across network layers with two metrics, learning plasticity () and memory stability (). It then maintains performance by freezing generalizable lower/middle layers (Maintain Generalizability) and memorizes past task feature patterns via a lightweight feature-map regulation (Memorize Feature Patterns). The approach achieves substantial memory-footprint reductions (up to about ) while maintaining or improving accuracy, and is validated on edge hardware. This work enables practical, on-device CL for resource-constrained environments.

Abstract

Continual learning (CL) is a technique that enables neural networks to constantly adapt to their dynamic surroundings. Despite being overlooked for a long time, this technology can considerably address the customized needs of users in edge devices. Actually, most CL methods require huge resource consumption by the training behavior to acquire generalizability among all tasks for delaying forgetting regardless of edge scenarios. Therefore, this paper proposes a compact algorithm called LightCL, which evaluates and compresses the redundancy of already generalized components in structures of the neural network. Specifically, we consider two factors of generalizability, learning plasticity and memory stability, and design metrics of both to quantitatively assess generalizability of neural networks during CL. This evaluation shows that generalizability of different layers in a neural network exhibits a significant variation. Thus, we by freezing generalized parts without the resource-intensive training process and by stabilizing feature extracting of previous tasks to enhance generalizability for less-generalized parts with a little extra memory, which is far less than the reduction by freezing. Experiments illustrate that LightCL outperforms other state-of-the-art methods and reduces at most \times memory footprint. We also verify the effectiveness of LightCL on the edge device.
Paper Structure (19 sections, 6 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 19 sections, 6 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Evaluation of accuracy and memory footprint on Split Tiny-ImageNet Imagenet under the TIL setting with sparsity ratio $90$%. The performance of LightCL outperforms the state-of-the-art efficient CL method SparCL SparCL and reduces at most $6.16$$\times$ memory footprint compared with SparCL$_{DER++}$.
  • Figure 2: Evaluation of MS and LP on Split-TinyImageNet dataset with ResNet-18 (first row) and VGG16 (second row) under CIL setting. The X-axis is the index of each layer in the certain network. We operate two consecutive tasks for CL, containing the same number of classes, to evaluate the metrics of each layer. Different colors of lines represent different tests with different class numbers. All tests run $5$ times.
  • Figure 3: Illustration of each component in LightCL. Before training a new task, we obtain feature map standards as references for later regulation and store them in memory, as shown in the bottom part of the figure. During the training process, shown in the top part of the figure, we Maintain Generalizability by freezing lower and middle layers, and Memorize Feature Patterns by regulation between current feature maps and stored feature map standards.
  • Figure 4: Ablation experiments on Split CIFAR-10 dataset under the TIL setting across 5 runs. Red bars denote the setting without the pre-trained model and Maintain Generalizability (freezing layers). Blue bars denote the setting only with the pre-trained model. Green bars have both of them.
  • Figure 5: We deploy LightCL (top figure) and SparCL$_{DER++}$ (bottom figure) on Jetson Nano and evaluate the dynamic change of memory footprint during CL. Only $5$ iterations are selected to illustrate this experiment.