A Bayesian Interpretation of Adaptive Low-Rank Adaptation

Haolin Chen; Philip N. Garner

A Bayesian Interpretation of Adaptive Low-Rank Adaptation

Haolin Chen, Philip N. Garner

TL;DR

This paper reframes adaptive budget allocation for parameter-efficient fine-tuning in large models through Bayesian importance metrics, centering on the signal-to-noise ratio (SNR) estimated via Improved Variational Online Newton (IVON). It demonstrates that SNR-based importance can match or exceed sensitivity-based AdaLoRA performance while providing a ~10% speed-up, and it establishes a theoretical link showing the sensitivity score corresponds to a Bayesian, magnitude-driven signal. The findings indicate that parameter magnitude, rather than variance, primarily governs importance, offering a principled perspective on pruning and budget allocation in PEFT. Overall, the approach delivers a faster, Bayesianly grounded alternative to AdaLoRA with competitive results on the GLUE benchmark using DeBERTaV3-base.

Abstract

Motivated by the sensitivity-based importance score of the adaptive low-rank adaptation (AdaLoRA), we utilize more theoretically supported metrics, including the signal-to-noise ratio (SNR), along with the Improved Variational Online Newton (IVON) optimizer, for adaptive parameter budget allocation. The resulting Bayesian counterpart not only has matched or surpassed the performance of using the sensitivity-based importance metric but is also a faster alternative to AdaLoRA with Adam. Our theoretical analysis reveals a significant connection between the two metrics, providing a Bayesian perspective on the efficacy of sensitivity as an importance score. Furthermore, our findings suggest that the magnitude, rather than the variance, is the primary indicator of the importance of parameters.

A Bayesian Interpretation of Adaptive Low-Rank Adaptation

TL;DR

Abstract

Paper Structure (22 sections, 1 figure, 1 table)

This paper contains 22 sections, 1 figure, 1 table.

Introduction
Adaptive Budget Allocation
Overview
Revisiting AdaLoRA
SVD-based adaptation
Sensitivity-based importance scoring
Global budget scheduler
Bayesian Importance Scores
SNR(theta)
SNR(|theta|)
|mu| and 1/sigma
Variational Inference
Experiments
Models and Datasets
Implementation Details
...and 7 more sections

Figures (1)

Figure 1: Comparison of rank distributions after fine-tuning DeBERTaV3-base on MNLI, with deeper colors indicating higher ranks. Results are averaged across five runs with different random seeds. $W_q$, $W_k$, $W_v$, $W_o$: weights of the query, key, value, output layers of attention; $W_{f_1}$, $W_{f_2}$: weights of the feed-forward layers.

A Bayesian Interpretation of Adaptive Low-Rank Adaptation

TL;DR

Abstract

A Bayesian Interpretation of Adaptive Low-Rank Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (1)