Table of Contents
Fetching ...

Efficient Continual Learning through Frequency Decomposition and Integration

Ruiqi Liu, Boyu Diao, Libo Huang, Hangda Liu, Chuanguang Yang, Zhulin An, Yongjun Xu

TL;DR

This paper tackles catastrophic forgetting in continual learning under resource constraints by introducing FDINet, a framework that decomposes inputs into low- and high-frequency streams via the Discrete Wavelet Transform and processes them with two lightweight networks. Through mutual frequency integration, FDINet enables efficient rehearsal-based learning by preserving global structure with low-frequency information and retaining class-specific details with high-frequency signals, while compressing both input and model. Empirical results across CIFAR-10, Tiny ImageNet, and ImageNet-R show up to a $7.49\%$ accuracy gain over SOTA, up to $78\%$ fewer backbone parameters, $80\%$ lower peak memory, and up to $5\times$ faster training on edge devices, demonstrating strong practical impact for edge continual learning. The framework is shown to generalize across different rehearsal methods (e.g., ER, DER++, CLS-ER) and efficient CL baselines, highlighting its potential as a unified approach to both accelerate training and mitigate forgetting in dynamic data streams.

Abstract

Continual learning (CL) aims to learn new tasks while retaining past knowledge, addressing the challenge of forgetting during task adaptation. Rehearsal-based methods, which replay previous samples, effectively mitigate forgetting. However, research on enhancing the efficiency of these methods, especially in resource-constrained environments, remains limited, hindering their application in real-world systems with dynamic data streams. The human perceptual system processes visual scenes through complementary frequency channels: low-frequency signals capture holistic cues, while high-frequency components convey structural details vital for fine-grained discrimination. Inspired by this, we propose the Frequency Decomposition and Integration Network (FDINet), a novel framework that decomposes and integrates information across frequencies. FDINet designs two lightweight networks to independently process low- and high-frequency components of images. When integrated with rehearsal-based methods, this frequency-aware design effectively enhances cross-task generalization through low-frequency information, preserves class-specific details using high-frequency information, and facilitates efficient training due to its lightweight architecture. Experiments demonstrate that FDINet reduces backbone parameters by 78%, improves accuracy by up to 7.49% over state-of-the-art (SOTA) methods, and decreases peak memory usage by up to 80%. Additionally, on edge devices, FDINet accelerates training by up to 5$\times$.

Efficient Continual Learning through Frequency Decomposition and Integration

TL;DR

This paper tackles catastrophic forgetting in continual learning under resource constraints by introducing FDINet, a framework that decomposes inputs into low- and high-frequency streams via the Discrete Wavelet Transform and processes them with two lightweight networks. Through mutual frequency integration, FDINet enables efficient rehearsal-based learning by preserving global structure with low-frequency information and retaining class-specific details with high-frequency signals, while compressing both input and model. Empirical results across CIFAR-10, Tiny ImageNet, and ImageNet-R show up to a accuracy gain over SOTA, up to fewer backbone parameters, lower peak memory, and up to faster training on edge devices, demonstrating strong practical impact for edge continual learning. The framework is shown to generalize across different rehearsal methods (e.g., ER, DER++, CLS-ER) and efficient CL baselines, highlighting its potential as a unified approach to both accelerate training and mitigate forgetting in dynamic data streams.

Abstract

Continual learning (CL) aims to learn new tasks while retaining past knowledge, addressing the challenge of forgetting during task adaptation. Rehearsal-based methods, which replay previous samples, effectively mitigate forgetting. However, research on enhancing the efficiency of these methods, especially in resource-constrained environments, remains limited, hindering their application in real-world systems with dynamic data streams. The human perceptual system processes visual scenes through complementary frequency channels: low-frequency signals capture holistic cues, while high-frequency components convey structural details vital for fine-grained discrimination. Inspired by this, we propose the Frequency Decomposition and Integration Network (FDINet), a novel framework that decomposes and integrates information across frequencies. FDINet designs two lightweight networks to independently process low- and high-frequency components of images. When integrated with rehearsal-based methods, this frequency-aware design effectively enhances cross-task generalization through low-frequency information, preserves class-specific details using high-frequency information, and facilitates efficient training due to its lightweight architecture. Experiments demonstrate that FDINet reduces backbone parameters by 78%, improves accuracy by up to 7.49% over state-of-the-art (SOTA) methods, and decreases peak memory usage by up to 80%. Additionally, on edge devices, FDINet accelerates training by up to 5.

Paper Structure

This paper contains 22 sections, 5 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: (a) Diagram of the model compression method. (b) Diagram of the input compression method. (c) Diagram of the proposed FDINet. By utilizing frequency decomposition and integration, FDINet compresses both the input and the model. Through the synergistic replay mechanism of high- and low-frequency components, it compensates for the potential performance loss caused by compression.
  • Figure 2: Continual learning results on Split CIFAR-10 (5 tasks) using ER-ACE caccia2021new. Images were pre-processed with high-pass and low-pass filters. Left: Gradient-based attention maps indicating model-attended input pixels. Low-frequency features in the images tend to focus on global structures, while high-frequency features focus more on edge textures and are susceptible to noise. Right: Classification accuracy. Despite filtering out a significant amount of image information, the accuracy degradation is not substantial.
  • Figure 3: The main framework of FDINet. We decompose the original image into frequency components using discrete wavelet transform. Subsequently, two lightweight networks are used to extract features from the reduced-size high-frequency and low-frequency inputs, respectively. To better leverage the information from different frequencies, we design feature aggregators to integrate the intermediate frequency features from both lightweight networks.
  • Figure 4: Design choices for frequency integration operations in feature aggregators.
  • Figure 5: Comparison of Class-IL accuracy of different methods on the Split ImageNet-R dataset. The values in parentheses in the legend indicate the average accuracy.
  • ...and 3 more figures