Table of Contents
Fetching ...

DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks

Nghiem T. Diep, Hien Dang, Tuan Truong, Tan Dinh, Huy Nguyen, Nhat Ho

TL;DR

This work proposes DoRAN, a new variant of DoRA designed to further stabilize training and boost the sample efficiency of DoRA, and shows that DoRAN consistently outperforms LoRA, DoRA, and other PEFT baselines.

Abstract

Parameter-efficient fine-tuning (PEFT) methods have become the standard paradigm for adapting large-scale models. Among these techniques, Weight-Decomposed Low-Rank Adaptation (DoRA) has been shown to improve both the learning capacity and training stability of the vanilla Low-Rank Adaptation (LoRA) method by explicitly decomposing pre-trained weights into magnitude and directional components. In this work, we propose DoRAN, a new variant of DoRA designed to further stabilize training and boost the sample efficiency of DoRA. Our approach includes two key stages: (i) injecting noise into the denominator of DoRA's weight decomposition, which serves as an adaptive regularizer to mitigate instabilities; and (ii) replacing static low-rank matrices with auxiliary networks that generate them dynamically, enabling parameter coupling across layers and yielding better sample efficiency in both theory and practice. Comprehensive experiments on vision and language benchmarks show that DoRAN consistently outperforms LoRA, DoRA, and other PEFT baselines. These results underscore the effectiveness of combining stabilization through noise-based regularization with network-based parameter generation, offering a promising direction for robust and efficient fine-tuning of foundation models.

DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks

TL;DR

This work proposes DoRAN, a new variant of DoRA designed to further stabilize training and boost the sample efficiency of DoRA, and shows that DoRAN consistently outperforms LoRA, DoRA, and other PEFT baselines.

Abstract

Parameter-efficient fine-tuning (PEFT) methods have become the standard paradigm for adapting large-scale models. Among these techniques, Weight-Decomposed Low-Rank Adaptation (DoRA) has been shown to improve both the learning capacity and training stability of the vanilla Low-Rank Adaptation (LoRA) method by explicitly decomposing pre-trained weights into magnitude and directional components. In this work, we propose DoRAN, a new variant of DoRA designed to further stabilize training and boost the sample efficiency of DoRA. Our approach includes two key stages: (i) injecting noise into the denominator of DoRA's weight decomposition, which serves as an adaptive regularizer to mitigate instabilities; and (ii) replacing static low-rank matrices with auxiliary networks that generate them dynamically, enabling parameter coupling across layers and yielding better sample efficiency in both theory and practice. Comprehensive experiments on vision and language benchmarks show that DoRAN consistently outperforms LoRA, DoRA, and other PEFT baselines. These results underscore the effectiveness of combining stabilization through noise-based regularization with network-based parameter generation, offering a promising direction for robust and efficient fine-tuning of foundation models.

Paper Structure

This paper contains 42 sections, 5 theorems, 117 equations, 3 figures, 5 tables.

Key Result

Theorem 1

Under the setting of non-shared structure defined in Eq. (eq:y) and Eq. (eq:dora), the following minimax lower bound of estimating $G_*$ using $\widehat{G}_n$ defined in Eq. (eq:least_squared_estimator) holds for any $r\in\mathbb{N}$: where $\mathbb{E}_{f_{G}}$ denotes the expectation taken w.r.t. the product measure $f^n_G$.

Figures (3)

  • Figure 1: Illustration of DoRAN. The matrices ${\bm{W}}_1^A$ and ${\bm{W}}_1^B$ are shared across query and value projection layers, as well as across all attention heads within the same Transformer block.
  • Figure 2: Average sample efficiency on the commonsense reasoning datasets.
  • Figure 3: The detail of sample efficiency on each commonsense reasoning dataset with LLaMA-7B settings.

Theorems & Definitions (5)

  • Theorem 1
  • Theorem 2
  • Lemma 1
  • Proposition 1
  • Lemma 2