Table of Contents
Fetching ...

Merge and Bound: Direct Manipulations on Weights for Class Incremental Learning

Taehoon Kim, Donghwan Jang, Bohyung Han

TL;DR

This paper tackles catastrophic forgetting in Class Incremental Learning by introducing Merge-and-Bound (M&B), a plug‑in training strategy that directly manipulates network weights. It uses inter‑task weight merging to form a base model across all previous stages and intra‑task weight merging to refine the current task trajectory, complemented by a bounded update that restricts weight changes to stay near the base model. The approach integrates with existing CIL methods and yields consistent performance gains on CIFAR‑100 and ImageNet‑100/1000 benchmarks, with pronounced improvements as the number of tasks grows and under limited memory budgets. Overall, M&B enhances stability and plasticity in continual learning by promoting reliable weight merging and preserving prior representations, offering a practical, low‑cost enhancement for real‑world continual learning deployments.

Abstract

We present a novel training approach, named Merge-and-Bound (M&B) for Class Incremental Learning (CIL), which directly manipulates model weights in the parameter space for optimization. Our algorithm involves two types of weight merging: inter-task weight merging and intra-task weight merging. Inter-task weight merging unifies previous models by averaging the weights of models from all previous stages. On the other hand, intra-task weight merging facilitates the learning of current task by combining the model parameters within current stage. For reliable weight merging, we also propose a bounded update technique that aims to optimize the target model with minimal cumulative updates and preserve knowledge from previous tasks; this strategy reveals that it is possible to effectively obtain new models near old ones, reducing catastrophic forgetting. M&B is seamlessly integrated into existing CIL methods without modifying architecture components or revising learning objectives. We extensively evaluate our algorithm on standard CIL benchmarks and demonstrate superior performance compared to state-of-the-art methods.

Merge and Bound: Direct Manipulations on Weights for Class Incremental Learning

TL;DR

This paper tackles catastrophic forgetting in Class Incremental Learning by introducing Merge-and-Bound (M&B), a plug‑in training strategy that directly manipulates network weights. It uses inter‑task weight merging to form a base model across all previous stages and intra‑task weight merging to refine the current task trajectory, complemented by a bounded update that restricts weight changes to stay near the base model. The approach integrates with existing CIL methods and yields consistent performance gains on CIFAR‑100 and ImageNet‑100/1000 benchmarks, with pronounced improvements as the number of tasks grows and under limited memory budgets. Overall, M&B enhances stability and plasticity in continual learning by promoting reliable weight merging and preserving prior representations, offering a practical, low‑cost enhancement for real‑world continual learning deployments.

Abstract

We present a novel training approach, named Merge-and-Bound (M&B) for Class Incremental Learning (CIL), which directly manipulates model weights in the parameter space for optimization. Our algorithm involves two types of weight merging: inter-task weight merging and intra-task weight merging. Inter-task weight merging unifies previous models by averaging the weights of models from all previous stages. On the other hand, intra-task weight merging facilitates the learning of current task by combining the model parameters within current stage. For reliable weight merging, we also propose a bounded update technique that aims to optimize the target model with minimal cumulative updates and preserve knowledge from previous tasks; this strategy reveals that it is possible to effectively obtain new models near old ones, reducing catastrophic forgetting. M&B is seamlessly integrated into existing CIL methods without modifying architecture components or revising learning objectives. We extensively evaluate our algorithm on standard CIL benchmarks and demonstrate superior performance compared to state-of-the-art methods.

Paper Structure

This paper contains 27 sections, 4 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Description of inter-task weight merging: Upon the completion of the ${k}^{\text{th}}$ incremental stage, we establish the base model $M^{\text{base}}_{k+1}(\cdot)$, which will serve as the initialization point for the $(k+1)^{\text{st}}$ stage. The model comprises a feature extractor $f_{\theta_{k+1}^{\text{base}}}(\cdot)$ and a classifier $g_{\phi_{k+1}^{\text{base}}}(\cdot)$, and they are constructed by the following procedures. (a) To construct the base feature extractor $f_{\theta_{k+1}^{\text{base}}}(\cdot)$, we set ${\theta_{k+1}^{\text{base}}}$ to the moving average of all the previous feature extractor weights, $\theta_1, \theta_2, \cdots, \theta_{k}$, which is easily computed with $\theta_k^{\text{base}}$ and $\theta_k$ in a recursive manner following Equation \ref{['eq:feature_extractor_base']}. (b) For learning the classifier $g_{\phi_{k+1}^{\text{base}}}(\cdot)$, we concatenate the weights of the current base classifier $\phi_{k}^{\text{base}}$ with the weights of the current classifier ${\phi_{k}}$ associated with the class set in the current task $\mathcal{C}_{k}$.
  • Figure 2: (a) Illustration of the intra-task weight merging: We introduce an intra-task weight merging by the moving average of weights in multiple models along the training trajectories, as described in Equation \ref{['eq:intra_task_averaging']}. Intra-task weight merged model is utilized for inference and for computing the next stage base model $M_k^{\text{base}}(\cdot)$. (b) Illustration of the bounded update technique: We constrain the weight updates around the base model denoted by $M_k^{\text{base}}(\cdot)$. This strategy is designed to preserve the knowledge in the base model but search for the unexplored space during optimization.
  • Figure 3: We measure the cosine similarities between all pairs of the model update vectors occurred in each stage. The model update vectors become positively correlated when M&B is incorporated.
  • Figure 4: CKA between models after training individual incremental stages. We visualize the similarity between pairs of models obtained from two different tasks---baseline task and comparison task---by measuring CKA of the representations of test examples of all classes learned up to baseline task extracted from the two models.