Table of Contents
Fetching ...

ABM-LoRA: Activation Boundary Matching for Fast Convergence in Low-Rank Adaptation

Dongha Lee, Jinhee Park, Minjun Kim, Junseok Kwon

TL;DR

ABM-LoRA addresses initialization bottlenecks in Low-Rank Adaptation by aligning adapter activation boundaries with a frozen pretrained model, reducing gradient information loss at step one. By minimizing a boundary loss over a representative batch, ABM creates a initialization that preserves gradient directions, enabling faster convergence across language, vision, and multi-task settings. Empirically, ABM-LoRA improves GLUE results for T5-Base, VTAB-1K performance for ViT-B/16, and WizardLM/LLaMA2-7B tasks, often matching or surpassing full fine-tuning with lower cost. The work also provides thorough ablations and practical guidance on layer selection, margins, and adapter rank, highlighting robust cross-domain benefits.

Abstract

We propose Activation Boundary Matching for Low-Rank Adaptation (ABM-LoRA), a principled initialization strategy that substantially accelerates the convergence of low-rank adapters. While LoRA offers high parameter efficiency, its random initialization restricts gradient updates to a mismatched tangent space, causing significant information loss and hindering early convergence. Our ABM-LoRA addresses this by aligning the adapter's activation boundaries with those of the pretrained model before downstream training, thereby maximizing the projection of full-parameter gradients into the adapter subspace. This alignment sharply reduces information loss at initialization, yields a lower starting loss, and accelerates convergence. We demonstrate ABM-LoRA's effectiveness across diverse architectures and tasks: language understanding (T5-Base on GLUE), dialogue generation (LLaMA2-7B on WizardLM), and vision recognition (ViT-B/16 on VTAB-1K). On VTAB-1K, it achieves the highest accuracy among all methods, with strong gains on structured reasoning tasks requiring geometric understanding.

ABM-LoRA: Activation Boundary Matching for Fast Convergence in Low-Rank Adaptation

TL;DR

ABM-LoRA addresses initialization bottlenecks in Low-Rank Adaptation by aligning adapter activation boundaries with a frozen pretrained model, reducing gradient information loss at step one. By minimizing a boundary loss over a representative batch, ABM creates a initialization that preserves gradient directions, enabling faster convergence across language, vision, and multi-task settings. Empirically, ABM-LoRA improves GLUE results for T5-Base, VTAB-1K performance for ViT-B/16, and WizardLM/LLaMA2-7B tasks, often matching or surpassing full fine-tuning with lower cost. The work also provides thorough ablations and practical guidance on layer selection, margins, and adapter rank, highlighting robust cross-domain benefits.

Abstract

We propose Activation Boundary Matching for Low-Rank Adaptation (ABM-LoRA), a principled initialization strategy that substantially accelerates the convergence of low-rank adapters. While LoRA offers high parameter efficiency, its random initialization restricts gradient updates to a mismatched tangent space, causing significant information loss and hindering early convergence. Our ABM-LoRA addresses this by aligning the adapter's activation boundaries with those of the pretrained model before downstream training, thereby maximizing the projection of full-parameter gradients into the adapter subspace. This alignment sharply reduces information loss at initialization, yields a lower starting loss, and accelerates convergence. We demonstrate ABM-LoRA's effectiveness across diverse architectures and tasks: language understanding (T5-Base on GLUE), dialogue generation (LLaMA2-7B on WizardLM), and vision recognition (ViT-B/16 on VTAB-1K). On VTAB-1K, it achieves the highest accuracy among all methods, with strong gains on structured reasoning tasks requiring geometric understanding.

Paper Structure

This paper contains 27 sections, 25 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Top: Conceptual illustration showing how a poorly aligned adapter subspace (left) discards a significant portion of the gradient signal, whereas our ABM initialization (right) aligns the subspace to fully preserve the gradient direction. Bottom: Training and information loss comparison between LoRA and our ABM‐LoRA on T5 (GLUE benchmark), highlighting faster convergence and reduced early‐stage information loss with ABM‐LoRA. Information loss is quantified as the squared Frobenius norm of the components of the full gradient that are discarded when projected onto the adapter’s initial tangent space. See Section \ref{['sec:method']} for theoretical analysis.
  • Figure 2: Violin plots of accuracy distributions across ablation factors.
  • Figure A.2: Training and information‐loss curves for LoRA vs. ABM‐LoRA on T5, shown for four GLUE dev sets (MNLI, SST-2, QNLI, MRPC). ABM‐LoRA not only starts from a substantially lower initial training loss, but also reduces both training and information‐loss more steeply than standard LoRA. These consistent performance gains across diverse GLUE tasks demonstrate that ABM initialization effectively mitigates early‐stage information loss and accelerates convergence.
  • Figure A.3: Training loss curves on VTAB-1K tasks. We show four examples from the 19-task benchmark to illustrate ABM-LoRA's advantages in achieving lower initial training loss and faster early-stage convergence.
  • Figure A.4: Ablation study by adapter rank (r=8 vs. r=16) on dialogue performance with dropout fixed at 0.1. (a,b) MT-Bench scores with zoomed y-axis to highlight small differences. (c,d) Length-controlled win rate (LC) and win rate (WR) percentages.
  • ...and 1 more figures