Table of Contents
Fetching ...

ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models

Raghav Singhal, Kaustubh Ponkshe, Rohit Vartak, Praneeth Vepakomma

TL;DR

ABBA is introduced, a new PEFT architecture that reparameterizes the update as a Hadamard product of two independently learnable low-rank matrices, leading to significantly higher expressivity under the same parameter budget.

Abstract

Large Language Models have demonstrated strong performance across a wide range of tasks, but adapting them efficiently to new domains remains a key challenge. Parameter-Efficient Fine-Tuning (PEFT) methods address this by introducing lightweight, trainable modules while keeping most pre-trained weights fixed. The prevailing approach, LoRA, models updates using a low-rank decomposition, but its expressivity is inherently constrained by the rank. Recent methods like HiRA aim to increase expressivity by incorporating a Hadamard product with the frozen weights, but still rely on the structure of the pre-trained model. We introduce ABBA, a new PEFT architecture that reparameterizes the update as a Hadamard product of two independently learnable low-rank matrices. In contrast to prior work, ABBA fully decouples the update from the pre-trained weights, enabling both components to be optimized freely. This leads to significantly higher expressivity under the same parameter budget, a property we validate through matrix reconstruction experiments. Empirically, ABBA achieves state-of-the-art results on arithmetic and commonsense reasoning benchmarks, consistently outperforming existing PEFT methods by a significant margin across multiple models. Our code is publicly available at: https://github.com/CERT-Lab/abba.

ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models

TL;DR

ABBA is introduced, a new PEFT architecture that reparameterizes the update as a Hadamard product of two independently learnable low-rank matrices, leading to significantly higher expressivity under the same parameter budget.

Abstract

Large Language Models have demonstrated strong performance across a wide range of tasks, but adapting them efficiently to new domains remains a key challenge. Parameter-Efficient Fine-Tuning (PEFT) methods address this by introducing lightweight, trainable modules while keeping most pre-trained weights fixed. The prevailing approach, LoRA, models updates using a low-rank decomposition, but its expressivity is inherently constrained by the rank. Recent methods like HiRA aim to increase expressivity by incorporating a Hadamard product with the frozen weights, but still rely on the structure of the pre-trained model. We introduce ABBA, a new PEFT architecture that reparameterizes the update as a Hadamard product of two independently learnable low-rank matrices. In contrast to prior work, ABBA fully decouples the update from the pre-trained weights, enabling both components to be optimized freely. This leads to significantly higher expressivity under the same parameter budget, a property we validate through matrix reconstruction experiments. Empirically, ABBA achieves state-of-the-art results on arithmetic and commonsense reasoning benchmarks, consistently outperforming existing PEFT methods by a significant margin across multiple models. Our code is publicly available at: https://github.com/CERT-Lab/abba.

Paper Structure

This paper contains 50 sections, 4 theorems, 38 equations, 4 figures, 10 tables.

Key Result

Theorem 1

Let $B_1 A_1, B_2 A_2 \in \mathbb{R}^{m \times n}$ . Then, $(B_1 A_1) \odot (B_2 A_2) = \underbrace{(B_1 \odot_r B_2)}_{m \times r_1 r_2} \underbrace{(A_1^\top \odot_r A_2^\top)^\top}_{r_1 r_2 \times n},$ where $\odot_r$denotes the row-wise Khatri–Rao product.

Figures (4)

  • Figure 1: Left: Illustration of ABBA's parameterization, where the update is expressed as the Hadamard product of two learnable low-rank matrices. Right: A toy experiment demonstrating ABBA’s optimization behavior. We first train a 2-layer MLP to classify the first 8 MNIST digits, then fine-tune it to recognize the last 2. ABBA converges faster and achieves better final performance.
  • Figure 2: Empirical Reconstruction Errors. We compare ABBA and LoRA decompositions across various matrix types by measuring reconstruction error $\mathcal{E}(r)$ under equal parameter budgets. For each LoRA rank $r$, we set ABBA ranks to $r_1 = r_2 = r/2$ for a fair comparison. ABBA consistently achieves significantly lower reconstruction error than LoRA, across all matrix types.
  • Figure 3: Impact of selectively fine-tuning individual transformer components - Key, Query, Value, Output, Up, Gate, and Down projections, with ABBA (Mistral-7B).
  • Figure 4: Comparison of training memory requirements across various methods. Results are reported for all models used in our work, with sequence length and batch size fixed at 256 and 1, respectively.

Theorems & Definitions (8)

  • Theorem 1: Khatri–Rao Factorization slyusar1997new
  • proof
  • Definition 1: Rank Stability of ABBA Adapters rslorawang2024lora
  • Theorem 2: Rank-Stability of ABBA
  • proof
  • Theorem : Khatri–Rao Factorization slyusar1997new
  • proof
  • Theorem : Rank-Stability of ABBA