Table of Contents
Fetching ...

Low-rank variational dropout: Rank selection and uncertainty in adapters

Cooper Doyle, Rebecca Chan, Andy Hu, Anna Leontjeva

TL;DR

It is empirically show that BayesLoRA induces stable, non-arbitrary rank structure aligned with the intrinsic singular directions of the learned updates, and outperforms existing low-rank sparsification methods in accuracy at comparable training cost while delivering substantially improved predictive calibration at negligible additional overhead.

Abstract

Low-rank adaptation methods enable efficient task-specific updates in large neural networks, but provide no principled mechanism for uncertainty estimation or capacity control. We introduce Low-Rank Variational Dropout (LRVD), a Bayesian framework that operates directly in the space of low-rank adaptation. LRVD employs a scale-invariant, sparsity-inducing prior together with a structured variational family that ties uncertainty at the level of latent rank components, inducing rank-wise noise-to-signal ratios for automatic capacity selection. As a concrete instantiation, we apply LRVD to low-rank adaptation and obtain BayesLoRA, which jointly learns predictive uncertainty and the effective adapter rank with only O(r) additional parameters, where r is the adapter rank. We empirically show that BayesLoRA induces stable, non-arbitrary rank structure aligned with the intrinsic singular directions of the learned updates, and outperforms existing low-rank sparsification methods in accuracy at comparable training cost while delivering substantially improved predictive calibration at negligible additional overhead.

Low-rank variational dropout: Rank selection and uncertainty in adapters

TL;DR

It is empirically show that BayesLoRA induces stable, non-arbitrary rank structure aligned with the intrinsic singular directions of the learned updates, and outperforms existing low-rank sparsification methods in accuracy at comparable training cost while delivering substantially improved predictive calibration at negligible additional overhead.

Abstract

Low-rank adaptation methods enable efficient task-specific updates in large neural networks, but provide no principled mechanism for uncertainty estimation or capacity control. We introduce Low-Rank Variational Dropout (LRVD), a Bayesian framework that operates directly in the space of low-rank adaptation. LRVD employs a scale-invariant, sparsity-inducing prior together with a structured variational family that ties uncertainty at the level of latent rank components, inducing rank-wise noise-to-signal ratios for automatic capacity selection. As a concrete instantiation, we apply LRVD to low-rank adaptation and obtain BayesLoRA, which jointly learns predictive uncertainty and the effective adapter rank with only O(r) additional parameters, where r is the adapter rank. We empirically show that BayesLoRA induces stable, non-arbitrary rank structure aligned with the intrinsic singular directions of the learned updates, and outperforms existing low-rank sparsification methods in accuracy at comparable training cost while delivering substantially improved predictive calibration at negligible additional overhead.

Paper Structure

This paper contains 40 sections, 31 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Rank dynamics during fine-tuning on CoLA (DeBERTa-v3-base). Heatmaps show the sum of active adapter ranks over training steps. Top: distribution of effective rank across encoder layers. Bottom: distribution across model modules (attention projections and MLP). BayesLoRA progressively concentrates capacity into a small subset of layers and modules, illustrating structured, data-driven rank pruning via rank-wise variational dropout.
  • Figure 2: Bayesian modeling in rank space. Low-rank adaptation represents updates as $\Delta W = BA$, mapping parameters from weight space into a low-dimensional rank space. LRVD places uncertainty over this rank space rather than the full weight space, enabling structured uncertainty and rank-wise sparsification with minimal overhead. The surface is illustrative and does not represent a literal posterior density.
  • Figure 3: Effect of pruning threshold $\tau$ on accuracy and effective rank. BayesLoRA maintains accuracy while reducing rank across a wide range of $\tau$, yielding a smooth accuracy--compression trade-off.
  • Figure 4: Gauge symmetry breaking via Bayesian rank selection.Left: Cumulative energy capture as a function of retained rank, comparing the SVD upper bound (blue), BayesLoRA rank ordering (orange), and random permutations (green). Right: Distribution of AUC improvements of BayesLoRA over random permutations across modules and seeds. BayesLoRA consistently recovers intrinsic structure while random orderings do not.
  • Figure 5: Stability of rank pruning over training. Adapter effective rank (mean $\pm$ std across seeds) over training steps for ARC-C (left) and WG-S (right). The effective rank decreases monotonically and exhibits low variance across seeds, supporting stable capacity selection in practice.
  • ...and 1 more figures