Table of Contents
Fetching ...

Null-Space Filtering for Data-Free Continual Model Merging: Preserving Stability, Promoting Plasticity

Zihuan Qiu, Lei Wang, Yang Cao, Runtong Zhang, Bing Su, Yi Xu, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li

Abstract

Data-free continual model merging (DFCMM) aims to fuse independently fine-tuned models into a single backbone that evolves with incoming tasks without accessing task data. This paper revisits two fundamental desiderata for DFCMM: stability, avoiding interference with earlier tasks, and plasticity, adapting faithfully to each new task. This poses a challenge that existing approaches fail to address: how to bridge data-level desiderata with parameter-space optimization to ensure stability and plasticity in the absence of task data. To this end, we propose NUFILT (NUll-space FILTering), a data-free framework that directly links these desiderata into parameter-space optimization. Our key observation is that task vectors approximately align with representation subspaces, providing structural surrogates for enforcing stability and plasticity. Accordingly, we design a null-space projector that preserves prior responses by filtering overlapping components of new task vectors, ensuring stability. We further introduce a lightweight LoRA adapter that injects complementary task-specific signals to enable plasticity. The adapter is trained with a projection-based surrogate loss that preserves consistency with prior knowledge while introducing novel directions. This joint filtering-adaptation process enables the backbone to absorb new knowledge while retaining existing behaviors, with updates fused back in a layer-wise linear fashion without extra parameters or inference cost. Theoretically, we establish approximate subspace alignment guarantees that justify null-space filtering. Empirically, NUFILT achieves state-of-the-art performance with minimal forgetting on both vision and NLP benchmarks, improving average accuracy by 4-7% over OPCM and WUDI-Merging, while narrowing the gap to fine-tuning and reducing computation overhead. The code is available at: https://github.com/zihuanqiu/NUFILT

Null-Space Filtering for Data-Free Continual Model Merging: Preserving Stability, Promoting Plasticity

Abstract

Data-free continual model merging (DFCMM) aims to fuse independently fine-tuned models into a single backbone that evolves with incoming tasks without accessing task data. This paper revisits two fundamental desiderata for DFCMM: stability, avoiding interference with earlier tasks, and plasticity, adapting faithfully to each new task. This poses a challenge that existing approaches fail to address: how to bridge data-level desiderata with parameter-space optimization to ensure stability and plasticity in the absence of task data. To this end, we propose NUFILT (NUll-space FILTering), a data-free framework that directly links these desiderata into parameter-space optimization. Our key observation is that task vectors approximately align with representation subspaces, providing structural surrogates for enforcing stability and plasticity. Accordingly, we design a null-space projector that preserves prior responses by filtering overlapping components of new task vectors, ensuring stability. We further introduce a lightweight LoRA adapter that injects complementary task-specific signals to enable plasticity. The adapter is trained with a projection-based surrogate loss that preserves consistency with prior knowledge while introducing novel directions. This joint filtering-adaptation process enables the backbone to absorb new knowledge while retaining existing behaviors, with updates fused back in a layer-wise linear fashion without extra parameters or inference cost. Theoretically, we establish approximate subspace alignment guarantees that justify null-space filtering. Empirically, NUFILT achieves state-of-the-art performance with minimal forgetting on both vision and NLP benchmarks, improving average accuracy by 4-7% over OPCM and WUDI-Merging, while narrowing the gap to fine-tuning and reducing computation overhead. The code is available at: https://github.com/zihuanqiu/NUFILT

Paper Structure

This paper contains 38 sections, 7 theorems, 40 equations, 8 figures, 10 tables, 1 algorithm.

Key Result

Theorem 1

Let $\tau^{(l)} \in \mathbb{R}^{d_o \times d_i}$ be the task vector at layer $l$, and $H \in \mathbb{R}^{N \times d_i}$ a representation matrix of rank $r_d$ with right singular vectors $V_d \in \mathbb{R}^{d_i \times r_d}$. Suppose $\tau^{(l)}$ admits the decomposition $\tau^{(l)} = T_0 + E$ with $ where $\mathcal{A}(V_d^{(l)}, \hat{V}^{(l)}) = \tfrac{1}{r_d}\|\hat{V}^\top V_d\|_F^2$ and $\zeta

Figures (8)

  • Figure 1: Illustration of data-free continual model merging (DFCMM). At each step, only the current task model and the previously merged model are accessible, and the merging process is performed without access to any data. The merged model is expected to preserve prior knowledge (stability) while adapting efficiently to new tasks (plasticity).
  • Figure 2: Subspace affinity between data and task vectors in ViT-B/16 across eight datasets. Heatmaps show diagonal dominance, and layer-wise empirical cumulative distribution functions (ECDFs) confirm higher affinities for matched pairs.
  • Figure 3: Overview of the NUFILT procedure. ❶ Filtering: the new task vector is processed through a null-space projector that suppresses activations from previous tasks, ensuring stability to past knowledge. ❷ Adapting: within the filter, a lightweight LoRA adapter refines the update for the current task using a data-free objective. ❸ Fusing: the filter, task vector, and LoRA module are merged back into the backbone, keeping the parameter count and inference cost unchanged.
  • Figure 4: Hyper-parameter sensitivity analysis on the 8-task continual merging protocol. Setting $r=0$ corresponds to removing the associated component.
  • Figure 5: Evaluation of stability and plasticity surrogates, and their harmonic mean on the CLIP ViT-B/32 8-task continual merging protocol. Lower values in all metrics indicate better performance.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Theorem 1: Approximate Subspace Alignment
  • Corollary 1: Data-free upper bound
  • Theorem A.1: Weyl inequality for singular values weyl1912asymptotische
  • Theorem A.2: Wedin's Sin-Theta Theorem wedin1972perturbation
  • Proposition A.1: Approximate Linear Combination chengwhoever
  • Theorem A.3: Approximate Subspace Alignment
  • proof
  • Corollary A.1: Data-free upper bound
  • proof