Table of Contents
Fetching ...

BoRA: Bi-dimensional Weight-Decomposed Low-Rank Adaptation

Qiushi Wang, Yuchen Fan, Junwei Bao, Hongfei Jiang, Yang Song

TL;DR

BoRA advances parameter-efficient fine-tuning by enforcing symmetric bi-dimensional weight updates that operate along both row and column dimensions. Building on LoRA and DoRA, it introduces separate magnitude parameters for rows and columns with dual normalization steps, yielding more coherent and effective adaptations. Empirical results across NLG and NLU benchmarks show BoRA consistently outperforms LoRA and DoRA, with modest increases in trainable parameters. The approach offers practical gains for efficient fine-tuning of large models, while acknowledging limitations in language diversity, smaller models, and training time.

Abstract

In recent years, Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA) have significantly enhanced the adaptability of large-scale pre-trained models. Weight-Decomposed Low-Rank Adaptation (DoRA) improves upon LoRA by separating the magnitude and direction components of the weight matrix, leading to superior performance. However, DoRA's improvements are limited to the vertical dimension, resulting in an asymmetrical pattern between horizontal and vertical dimensions. This paper introduces BoRA, an innovative extension of LoRA and DoRA, characterized by symmetrical properties across horizontal and vertical dimensions. Our approach optimizes the weight matrix symmetrically by adjusting both column-wise and row-wise magnitudes. Extensive experiments demonstrate that BoRA surpasses state-of-the-art PEFT methods, including LoRA and DoRA, achieving superior results across various benchmarks.

BoRA: Bi-dimensional Weight-Decomposed Low-Rank Adaptation

TL;DR

BoRA advances parameter-efficient fine-tuning by enforcing symmetric bi-dimensional weight updates that operate along both row and column dimensions. Building on LoRA and DoRA, it introduces separate magnitude parameters for rows and columns with dual normalization steps, yielding more coherent and effective adaptations. Empirical results across NLG and NLU benchmarks show BoRA consistently outperforms LoRA and DoRA, with modest increases in trainable parameters. The approach offers practical gains for efficient fine-tuning of large models, while acknowledging limitations in language diversity, smaller models, and training time.

Abstract

In recent years, Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA) have significantly enhanced the adaptability of large-scale pre-trained models. Weight-Decomposed Low-Rank Adaptation (DoRA) improves upon LoRA by separating the magnitude and direction components of the weight matrix, leading to superior performance. However, DoRA's improvements are limited to the vertical dimension, resulting in an asymmetrical pattern between horizontal and vertical dimensions. This paper introduces BoRA, an innovative extension of LoRA and DoRA, characterized by symmetrical properties across horizontal and vertical dimensions. Our approach optimizes the weight matrix symmetrically by adjusting both column-wise and row-wise magnitudes. Extensive experiments demonstrate that BoRA surpasses state-of-the-art PEFT methods, including LoRA and DoRA, achieving superior results across various benchmarks.

Paper Structure

This paper contains 22 sections, 6 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Structure of BoRA: blue indicates frozen parameters, green indicates trainable parameters. Low-rank adaptation matrices and two independent magnitude matrices ensure symmetrical adjustment of the weight matrix.
  • Figure 2: Magnitude and direction updates from time interval $t-1$ to $t$, examining full parameter tuning (a, e) and three PEFT methods: LoRA (b, f), DoRA (c, g), and BoRA (d, h). Each method is evaluated in two dimensions: column-wise and row-wise. Green represents direction changes, while blue represents magnitude changes.
  • Figure 3: Magnitude and direction updates post-training, examining full parameter tuning (a, e) and three PEFT methods: LoRA (b, f), DoRA (c, g), and BoRA (d, h). Each method is evaluated in two dimensions: column-wise and row-wise. Green represents direction changes, while blue represents magnitude changes.