Table of Contents
Fetching ...

DiaBlo: Diagonal Blocks Are Sufficient For Finetuning

Selcuk Gurses, Aozhong Zhang, Yanxia Deng, Xun Dong, Xin Li, Naigang Wang, Penghang Yin, Zi Yang

TL;DR

DiaBlo is presented, a simple yet effective PEFT approach that updates only the diagonal blocks of selected model weight matrices that eliminates the need for low-rank matrix products and leads to stable and robust convergence while maintaining comparable memory efficiency and training speed to LoRA.

Abstract

Fine-tuning is a critical step for adapting large language models (LLMs) to domain-specific downstream tasks. To mitigate the substantial computational and memory costs of full-model fine-tuning, Parameter-Efficient Fine-Tuning (PEFT) methods have been proposed to update only a small subset of model parameters. However, performance gaps between PEFT approaches and full-model fine-tuning still exist. In this work, we present DiaBlo, a simple yet effective PEFT approach that updates only the diagonal blocks of selected model weight matrices. Unlike Low-Rank Adaptation (LoRA) and its variants, DiaBlo eliminates the need for low-rank matrix products, thereby avoiding the reliance on auxiliary initialization schemes or customized optimization strategies to improve convergence. This design leads to stable and robust convergence while maintaining comparable memory efficiency and training speed to LoRA. Moreover, we provide theoretical guarantees showing that, under mild low-rank conditions, DiaBlo is more expressive than LoRA in the linear problem and converges to a stationary point of the general nonlinear full fine-tuning. Through extensive experiments across a range of tasks, including commonsense reasoning, arithmetic reasoning, code generation, and safety alignment, we show that fine-tuning only diagonal blocks is sufficient for strong and consistent performance. DiaBlo not only achieves competitive accuracy but also preserves high memory efficiency and fast fine-tuning speed. Codes are available at https://github.com/ziyangjoy/DiaBlo.

DiaBlo: Diagonal Blocks Are Sufficient For Finetuning

TL;DR

DiaBlo is presented, a simple yet effective PEFT approach that updates only the diagonal blocks of selected model weight matrices that eliminates the need for low-rank matrix products and leads to stable and robust convergence while maintaining comparable memory efficiency and training speed to LoRA.

Abstract

Fine-tuning is a critical step for adapting large language models (LLMs) to domain-specific downstream tasks. To mitigate the substantial computational and memory costs of full-model fine-tuning, Parameter-Efficient Fine-Tuning (PEFT) methods have been proposed to update only a small subset of model parameters. However, performance gaps between PEFT approaches and full-model fine-tuning still exist. In this work, we present DiaBlo, a simple yet effective PEFT approach that updates only the diagonal blocks of selected model weight matrices. Unlike Low-Rank Adaptation (LoRA) and its variants, DiaBlo eliminates the need for low-rank matrix products, thereby avoiding the reliance on auxiliary initialization schemes or customized optimization strategies to improve convergence. This design leads to stable and robust convergence while maintaining comparable memory efficiency and training speed to LoRA. Moreover, we provide theoretical guarantees showing that, under mild low-rank conditions, DiaBlo is more expressive than LoRA in the linear problem and converges to a stationary point of the general nonlinear full fine-tuning. Through extensive experiments across a range of tasks, including commonsense reasoning, arithmetic reasoning, code generation, and safety alignment, we show that fine-tuning only diagonal blocks is sufficient for strong and consistent performance. DiaBlo not only achieves competitive accuracy but also preserves high memory efficiency and fast fine-tuning speed. Codes are available at https://github.com/ziyangjoy/DiaBlo.

Paper Structure

This paper contains 53 sections, 5 theorems, 31 equations, 7 figures, 20 tables.

Key Result

Theorem 1

Suppose that $\mathbf{X}$ is a generic rank-$r$ matrix. If the number of diagonal blocks $N\le \frac{m_1}{r}$ is a common factor of $m_1,m_2$, then any solution to the DiaBlo-LSQ eq:lsq-diablo also solves the full-LSQ eq:lsq.

Figures (7)

  • Figure 1: Comparison between full fine-tuning, LoRA, and proposed DiaBlo.
  • Figure 2: Comparison of DiaBlo with other PEFT methods on finetuning LLaMA2-7B.
  • Figure 3: $95\%$ effective rank ratios of modules in finetuned Llama3-8B
  • Figure 4: Comparison of average gradient norms for full weights and diagonal-blocks.
  • Figure 5: Accuracy of DiaBlo on GSM8K across different learning rates and block $N$.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Theorem 2
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Theorem 2
  • proof