Shared LoRA Subspaces for almost Strict Continual Learning
Prakhar Kaushik, Ankit Vaidya, Shravan Chaudhari, Rama Chellappa, Alan Yuille
TL;DR
Share tackles the problem of efficiently and continually adapting large pretrained models without data replay by learning a single, shared low-rank subspace. It initializes a foundational subspace from existing LoRA adapters via SVD and incrementally expands it with new information, training only lightweight coefficients and analytically updating the subspace to preserve past knowledge. Theoretical analysis provides incremental subspace error bounds, and empirical results show up to $100\times$ parameter reduction and $281\times$ memory savings while achieving performance near jointly trained baselines across vision, language, and multimodal tasks. This approach enables scalable, asynchronous continual learning and model serving by compressing hundreds of adapters into one reusable subspace, reducing resource use and broadening access to continual learning with large models.
Abstract
Adapting large pretrained models to new tasks efficiently and continually is crucial for real-world deployment but remains challenging due to catastrophic forgetting and the high cost of retraining. While parameter-efficient tuning methods like low rank adaptation (LoRA) reduce computational demands, they lack mechanisms for strict continual learning and knowledge integration, without relying on data replay, or multiple adapters. We propose Share, a novel approach to parameter efficient continual finetuning that learns and dynamically updates a single, shared low-rank subspace, enabling seamless adaptation across multiple tasks and modalities. Share constructs a foundational subspace that extracts core knowledge from past tasks and incrementally integrates new information by identifying essential subspace directions. Knowledge from each new task is incorporated into this evolving subspace, facilitating forward knowledge transfer, while minimizing catastrophic interference. This approach achieves up to 100x parameter reduction and 281x memory savings over traditional LoRA methods, maintaining performance comparable to jointly trained models. A single Share model can replace hundreds of task-specific LoRA adapters, supporting scalable, asynchronous continual learning. Experiments across image classification, natural language understanding, 3D pose estimation, and text-to-image generation validate its effectiveness, making Share a practical and scalable solution for lifelong learning in large-scale AI systems.
