Table of Contents
Fetching ...

Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA

Haodong Lu, Chongyang Zhao, Jason Xue, Lina Yao, Kristen Moore, Dong Gong

TL;DR

The paper tackles continual learning for vision–language PTMs, identifying limitations of replay and task-specific modules. It introduces CoDyRA, a dynamic rank-selective LoRA framework that jointly learns task representations while minimizing LoRA ranks via sparsity regularization, and merges updates back into pretrained weights with no inference overhead. Through systematic analyses and extensive experiments on MTIL and X-TAIL benchmarks, CoDyRA demonstrates state-of-the-art or competitive performance in both transferring to unseen domains and retaining prior knowledge, without storing past data. The approach shows strong generalization across VLMs (including CLIP and BLIP) and offers a scalable, data-free pathway for continual learning in large multimodal models.

Abstract

Continual learning (CL) aims to accumulate knowledge from sequential tasks without catastrophic forgetting. Vision-language models such as CLIP, with strong generalization, are widely used for CL. Existing methods often adapt isolated PTM components, increasing inference complexity and limiting model improvement, or rely on replay, stored data, or assumptions, leading to high costs and limited applicability. To advance models as continual learners, we explore CL through natural and efficient PTM updates rather than complex task-specific additions. We study continual low-rank learning and analyze how LoRA ranks and placements affect learning and forgetting. A higher-rank LoRA improves task learning (plasticity) but increases forgetting, while a lower-rank LoRA enhances stability but limits adaptation. We observe a plasticity-stability balance tied to rank across parameters and tasks, with moderately small ranks maximizing CL benefits. Motivated by this, we propose Continual Dynamic Rank-Selective LoRA (CoDyRA), which continually updates PTMs with LoRA adapters of adaptively optimized ranks. The new-task objective drives learning, while sparsity-promoting regularization minimizes ranks to reduce interference and forgetting, achieving a balance tailored to each parameter and task. Although all parameters are updated, the minimized ranks keep the model close to its prior state while enabling effective new-task learning. CoDyRA performs efficient CL as a sequence of LoRA-based updates without storing past data or relying on assumptions, preserving the original model architecture and adding no inference overhead. Experiments show CoDyRA improves new representations while retaining old knowledge, achieving state-of-the-art results. Code is available at https://github.com/jeff024/codyra.

Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA

TL;DR

The paper tackles continual learning for vision–language PTMs, identifying limitations of replay and task-specific modules. It introduces CoDyRA, a dynamic rank-selective LoRA framework that jointly learns task representations while minimizing LoRA ranks via sparsity regularization, and merges updates back into pretrained weights with no inference overhead. Through systematic analyses and extensive experiments on MTIL and X-TAIL benchmarks, CoDyRA demonstrates state-of-the-art or competitive performance in both transferring to unseen domains and retaining prior knowledge, without storing past data. The approach shows strong generalization across VLMs (including CLIP and BLIP) and offers a scalable, data-free pathway for continual learning in large multimodal models.

Abstract

Continual learning (CL) aims to accumulate knowledge from sequential tasks without catastrophic forgetting. Vision-language models such as CLIP, with strong generalization, are widely used for CL. Existing methods often adapt isolated PTM components, increasing inference complexity and limiting model improvement, or rely on replay, stored data, or assumptions, leading to high costs and limited applicability. To advance models as continual learners, we explore CL through natural and efficient PTM updates rather than complex task-specific additions. We study continual low-rank learning and analyze how LoRA ranks and placements affect learning and forgetting. A higher-rank LoRA improves task learning (plasticity) but increases forgetting, while a lower-rank LoRA enhances stability but limits adaptation. We observe a plasticity-stability balance tied to rank across parameters and tasks, with moderately small ranks maximizing CL benefits. Motivated by this, we propose Continual Dynamic Rank-Selective LoRA (CoDyRA), which continually updates PTMs with LoRA adapters of adaptively optimized ranks. The new-task objective drives learning, while sparsity-promoting regularization minimizes ranks to reduce interference and forgetting, achieving a balance tailored to each parameter and task. Although all parameters are updated, the minimized ranks keep the model close to its prior state while enabling effective new-task learning. CoDyRA performs efficient CL as a sequence of LoRA-based updates without storing past data or relying on assumptions, preserving the original model architecture and adding no inference overhead. Experiments show CoDyRA improves new representations while retaining old knowledge, achieving state-of-the-art results. Code is available at https://github.com/jeff024/codyra.

Paper Structure

This paper contains 24 sections, 9 equations, 16 figures, 10 tables.

Figures (16)

  • Figure 1: (a) Continual full fine-tuning leverages reference data to reduce forgetting of pre-trained knowledge (e.g., zero-shot performance zscl) but remains costly and prone to forgetting. (b) Task-specific modular methods (e.g., boostingrail) add isolated components requiring domain prediction or gating, increasing complexity and limiting adaptability and generalization. (c) Our approach enables efficient, universal continual PTM updating without reference data or task-specific designs, retaining PTM knowledge and improving performance on unseen data.
  • Figure 2: Overview of CoDyRA: we propose dynamic rank-selection LoRA, enabling each pre-trained weight matrix to adaptively add necessary ranks for downstream adaptation while retaining pre-trained capabilities. After each task, dynamic rank updates are merged into the pre-trained weights with no inference overhead.
  • Figure 3: Task adaptation and zero-shot capability retention after training CLIP with different LoRA insertion points and ranks. Shapes indicate encoders, colors denote transformer modules, and sizes reflect rank values.
  • Figure 5: Visualization and statistical analysis of rank activation on the Aircraft dataset.
  • Figure 6: Visualization and statistical analysis of rank activation on the Oxford Pets dataset.
  • ...and 11 more figures