Table of Contents
Fetching ...

SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models

Yuxuan Zhang

TL;DR

SECURA tackles the infeasibility of full fine-tuning for very large language models and the persistent problem of catastrophic forgetting in PEFT by combining CABR-LoRA inverse decomposition with Sigmoid-based Magnitude Norm (S-MagNorm). The approach preserves core knowledge while adapting to new tasks through a dynamic, Sigmoid-constrained update and two merge strategies that balance retention with performance gains. Across five backbones and 18 tasks, SECURA delivers consistent improvements over DoRA and LoRA on MCQ and QA benchmarks, and achieves state-of-the-art knowledge retention in continual-learning tests, maintaining substantial pre-trained knowledge while enabling continued learning. These results suggest SECURA provides a practical, scalable path for efficient LLM fine-tuning that minimizes forgetting without requiring access to prior data.

Abstract

With the rapid development of large language models (LLMs), fully fine-tuning (FT) these models is becoming increasingly infeasible due to high computational demands. Moreover, FT also increases the risk of catastrophic forgetting. As an alternative, Low-Rank Adaptation (LoRA) has been proposed. By fine-tuning only a small subset of parameters, LoRA achieves performance similar to FT while significantly reducing resource requirements. However, since LoRA inherits FT's design, the issue of catastrophic forgetting still remains. To address these limitations, we propose SECURA: Sigmoid-Enhanced CUR Decomposition LoRA, a novel PEFT variant designed to mitigate catastrophic forgetting while improving fine-tuning performance. Our method introduces a novel normalization technique, Sigmoid-based Magnitude Norm (S-MagNorm), which enhances parameter retention and fine-tuning efficiency. SECURA has been evaluated on a diverse range of tasks, including mathematical problem-solving (GSM8K), complex question-answering (CNNDM), translation (NewsDE), and complex multiple-choice reasoning (LogiQA). Experimental results demonstrate that it achieves an average fine-tuning improvement of 3.59% across four MCQ tasks and 2.51% across five QA tasks on Gemma2 2B, Qwen2 1.5B, Qwen2 7B, Llama3 8B, and Llama3.1 8B, outperforming DoRA. Additionally, SECURA demonstrates superior knowledge retention capabilities, achieving state-of-the-art performance in 16 continual learning tests and maintaining more than 70% accuracy on LLMs' basic knowledge compared to Experience Replay (ER), sequential learning (SEQ), EWC, I-LoRA, and CUR-LoRA.

SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models

TL;DR

SECURA tackles the infeasibility of full fine-tuning for very large language models and the persistent problem of catastrophic forgetting in PEFT by combining CABR-LoRA inverse decomposition with Sigmoid-based Magnitude Norm (S-MagNorm). The approach preserves core knowledge while adapting to new tasks through a dynamic, Sigmoid-constrained update and two merge strategies that balance retention with performance gains. Across five backbones and 18 tasks, SECURA delivers consistent improvements over DoRA and LoRA on MCQ and QA benchmarks, and achieves state-of-the-art knowledge retention in continual-learning tests, maintaining substantial pre-trained knowledge while enabling continued learning. These results suggest SECURA provides a practical, scalable path for efficient LLM fine-tuning that minimizes forgetting without requiring access to prior data.

Abstract

With the rapid development of large language models (LLMs), fully fine-tuning (FT) these models is becoming increasingly infeasible due to high computational demands. Moreover, FT also increases the risk of catastrophic forgetting. As an alternative, Low-Rank Adaptation (LoRA) has been proposed. By fine-tuning only a small subset of parameters, LoRA achieves performance similar to FT while significantly reducing resource requirements. However, since LoRA inherits FT's design, the issue of catastrophic forgetting still remains. To address these limitations, we propose SECURA: Sigmoid-Enhanced CUR Decomposition LoRA, a novel PEFT variant designed to mitigate catastrophic forgetting while improving fine-tuning performance. Our method introduces a novel normalization technique, Sigmoid-based Magnitude Norm (S-MagNorm), which enhances parameter retention and fine-tuning efficiency. SECURA has been evaluated on a diverse range of tasks, including mathematical problem-solving (GSM8K), complex question-answering (CNNDM), translation (NewsDE), and complex multiple-choice reasoning (LogiQA). Experimental results demonstrate that it achieves an average fine-tuning improvement of 3.59% across four MCQ tasks and 2.51% across five QA tasks on Gemma2 2B, Qwen2 1.5B, Qwen2 7B, Llama3 8B, and Llama3.1 8B, outperforming DoRA. Additionally, SECURA demonstrates superior knowledge retention capabilities, achieving state-of-the-art performance in 16 continual learning tests and maintaining more than 70% accuracy on LLMs' basic knowledge compared to Experience Replay (ER), sequential learning (SEQ), EWC, I-LoRA, and CUR-LoRA.

Paper Structure

This paper contains 35 sections, 15 equations, 4 figures, 14 tables, 1 algorithm.

Figures (4)

  • Figure 1: S-MagNorm Update method: The figure illustrates the process flow of the S-MagNorm Normalization algorithm. Shows the steps of starting with the fusion of the former weight matrix with the CABR module and moving through the ratio loss matrix calculation, the normalization and sigmoid steps, followed by the final limiting of the ratio loss matrix.
  • Figure 2: Gradient Analysis: The gradient variations during training shows that LoRA exhibits higher fluctuations, indicating greater parameter reshaping, which may increase the risk of catastrophic forgetting. In contrast, SECURA (CABR-LoRA + S-MagNorm) and CABR-LoRA Only shows lower gradient changes, avoided excessive drift, suggesting more stable parameter updates, improving catastrophic forgetting mitigation and ability of finding a better optimum for the model's parameters. Additionally, the range and variance comparsion shows the essential role of S-MagNorm. Hyperparameter settings are detailed in Appendix \ref{['app:hyper_param']}.
  • Figure 3: Comparison of Weight SVD Norm across different methods: The "Basic" method represents the original LLM weights without fine-tuning. After 2000 training steps, DoRA shows a 0.517 decrease in norm, indicating some loss of information. SECURA preserves more knowledge with a minor 0.061 change, while LoRA shows a significant 3.076 increase, reflecting greater weight modification. Hyperparameter settings are detailed in Appendix \ref{['app:hyper_param']}.
  • Figure 4: The performance of SECURA compared to 16 tasks baselines sequentially trained using LLaMA-3 8B (left) and Qwen-2 7B (right) backbones, tested under F-task with learning rate 1e-5. Detailed results are in Appendix \ref{['app:all_results']}