Table of Contents
Fetching ...

SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in LLMs

Dinithi Jayasuriya, Sina Tayebati, Davide Ettori, Ranganath Krishnan, Amit Ranjan Trivedi

TL;DR

SPARC addresses continual learning in LLMs by enabling task adaptation through low-dimensional, PCA-derived subspaces inside prompt tuning. The framework uses subspace overlap via cosine similarity to decide prompt reuse and orthogonal initialization to isolate new tasks, while keeping the base model frozen. Only soft prompts in a small subspace are trained, allowing strong forward and backward transfer and compatibility with LoRA. Empirical results on domain- and task-incremental settings show robust knowledge retention (up to 97% prior knowledge retained) and competitive accuracy with minimal parameter updates (as low as 0.04% and 1% with LoRA) across benchmarks like SuperGLUE.

Abstract

We propose SPARC, a lightweight continual learning framework for large language models (LLMs) that enables efficient task adaptation through prompt tuning in a lower-dimensional space. By leveraging principal component analysis (PCA), we identify a compact subspace of the training data. Optimizing prompts in this lower-dimensional space enhances training efficiency, as it focuses updates on the most relevant features while reducing computational overhead. Furthermore, since the model's internal structure remains unaltered, the extensive knowledge gained from pretraining is fully preserved, ensuring that previously learned information is not compromised during adaptation. Our method achieves high knowledge retention in both task-incremental and domain-incremental continual learning setups while fine-tuning only 0.04% of the model's parameters. Additionally, by integrating LoRA, we enhance adaptability to computational constraints, allowing for a tradeoff between accuracy and training cost. Experiments on the SuperGLUE benchmark demonstrate that our PCA-based prompt tuning combined with LoRA maintains full knowledge retention while improving accuracy, utilizing only 1% of the model's parameters. These results establish our approach as a scalable and resource-efficient solution for continual learning in LLMs.

SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in LLMs

TL;DR

SPARC addresses continual learning in LLMs by enabling task adaptation through low-dimensional, PCA-derived subspaces inside prompt tuning. The framework uses subspace overlap via cosine similarity to decide prompt reuse and orthogonal initialization to isolate new tasks, while keeping the base model frozen. Only soft prompts in a small subspace are trained, allowing strong forward and backward transfer and compatibility with LoRA. Empirical results on domain- and task-incremental settings show robust knowledge retention (up to 97% prior knowledge retained) and competitive accuracy with minimal parameter updates (as low as 0.04% and 1% with LoRA) across benchmarks like SuperGLUE.

Abstract

We propose SPARC, a lightweight continual learning framework for large language models (LLMs) that enables efficient task adaptation through prompt tuning in a lower-dimensional space. By leveraging principal component analysis (PCA), we identify a compact subspace of the training data. Optimizing prompts in this lower-dimensional space enhances training efficiency, as it focuses updates on the most relevant features while reducing computational overhead. Furthermore, since the model's internal structure remains unaltered, the extensive knowledge gained from pretraining is fully preserved, ensuring that previously learned information is not compromised during adaptation. Our method achieves high knowledge retention in both task-incremental and domain-incremental continual learning setups while fine-tuning only 0.04% of the model's parameters. Additionally, by integrating LoRA, we enhance adaptability to computational constraints, allowing for a tradeoff between accuracy and training cost. Experiments on the SuperGLUE benchmark demonstrate that our PCA-based prompt tuning combined with LoRA maintains full knowledge retention while improving accuracy, utilizing only 1% of the model's parameters. These results establish our approach as a scalable and resource-efficient solution for continual learning in LLMs.

Paper Structure

This paper contains 22 sections, 9 equations, 5 figures.

Figures (5)

  • Figure 1: Overview of SPARC:(a) The subspace of the new dataset is computed using PCA. By measuring the cosine similarity between this new subspace and previously learned prompt subspaces, the framework determines whether a similar prompt already exists. If a match is found, the existing prompt is reused for initialization, enhancing knowledge retention. Otherwise, a new prompt is initialized in an orthogonal subspace to the existing prompts, ensuring differentiation and efficient adaptation. (b) The prompt embeddings consist of two key components: tunable soft tokens and a PCA-based transformation matrix. This design significantly reduces the number of trainable parameters compared to traditional prompt tuning methods, making the approach more efficient while preserving model adaptability.
  • Figure 2: Per-token accuracy with varying numbers of soft tokens and principal components: (a) 100 principal components, 20 soft tokens. (b) 300 principal components, 20 soft tokens. (c) 100 principal components, 40 soft tokens. (d) Accuracy with different number of PCA components
  • Figure 3: Accuracy in Domain Incremental Learning. (a) Non-Continual learning Accuracy is calculated by fine-tuning the model individually on each dataset, while final accuracy is obtained by sequentially fine-tuning the model across datasets. (b) Forgetting ratio for each dataset after the completion of sequential training.
  • Figure 4: Performance Comparison of Task-Incremental Learning Methods: Accuracy results for three approaches, PCA-Based Learning, Full Finetuning, and LoRA-Integrated Prompt-Based Learning.
  • Figure 5: Task Incremental Learning: Accuracy comparison of PCA-Based Continual Learning, PCA-Based Non-Continual Learning, Full Finetuning, and Zero-Shot Inference.