Table of Contents
Fetching ...

Calibrated Adaptation: Bayesian Stiefel Manifold Priors for Reliable Parameter-Efficient Fine-Tuning

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

TL;DR

Stiefel-Bayes Adapters (SBA) is introduced, a Bayesian PEFT framework that places a Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold $\St$ and performs approximate posterior inference via tangent space Laplace approximation with geodesic retraction.

Abstract

Parameter-efficient fine-tuning methods such as LoRA enable practical adaptation of large language models but provide no principled uncertainty estimates, leading to poorly calibrated predictions and unreliable behavior under domain shift. We introduce Stiefel-Bayes Adapters (SBA), a Bayesian PEFT framework that places a Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold $\St$ and performs approximate posterior inference via tangent space Laplace approximation with geodesic retraction. Unlike Gaussian priors in flat space projected onto orthogonality constraints, our prior on the manifold naturally encodes the inductive bias that adapter subspaces should be well conditioned and orthogonal, while the posterior provides calibrated predictive uncertainty without recalibration. We prove formally that the tangent space approximation strictly avoids the structural variance inflation inherent in projecting from ambient space, establishing a rigorous theoretical advantage for intrinsic manifold inference. Across GLUE and SuperGLUE benchmarks on RoBERTa-large, LLaMA-2-7B, LLaMA-2-13B, Mistral-7B, and Qwen2.5-7B, domain shift evaluations, selective prediction protocols, and an abstractive summarization task, SBA achieves task performance comparable to LoRA and DoRA while reducing Expected Calibration Error by 18 to 34\% over deterministic baselines, improving selective prediction AUROC by 12 to 25\% under domain shift, and outperforming deep ensembles of five LoRA models on OOD detection at a fraction of the parameter cost. Our results demonstrate that where you place uncertainty, on the right geometric structure, matters more than simply adding any Bayesian treatment to adapters.

Calibrated Adaptation: Bayesian Stiefel Manifold Priors for Reliable Parameter-Efficient Fine-Tuning

TL;DR

Stiefel-Bayes Adapters (SBA) is introduced, a Bayesian PEFT framework that places a Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold and performs approximate posterior inference via tangent space Laplace approximation with geodesic retraction.

Abstract

Parameter-efficient fine-tuning methods such as LoRA enable practical adaptation of large language models but provide no principled uncertainty estimates, leading to poorly calibrated predictions and unreliable behavior under domain shift. We introduce Stiefel-Bayes Adapters (SBA), a Bayesian PEFT framework that places a Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold and performs approximate posterior inference via tangent space Laplace approximation with geodesic retraction. Unlike Gaussian priors in flat space projected onto orthogonality constraints, our prior on the manifold naturally encodes the inductive bias that adapter subspaces should be well conditioned and orthogonal, while the posterior provides calibrated predictive uncertainty without recalibration. We prove formally that the tangent space approximation strictly avoids the structural variance inflation inherent in projecting from ambient space, establishing a rigorous theoretical advantage for intrinsic manifold inference. Across GLUE and SuperGLUE benchmarks on RoBERTa-large, LLaMA-2-7B, LLaMA-2-13B, Mistral-7B, and Qwen2.5-7B, domain shift evaluations, selective prediction protocols, and an abstractive summarization task, SBA achieves task performance comparable to LoRA and DoRA while reducing Expected Calibration Error by 18 to 34\% over deterministic baselines, improving selective prediction AUROC by 12 to 25\% under domain shift, and outperforming deep ensembles of five LoRA models on OOD detection at a fraction of the parameter cost. Our results demonstrate that where you place uncertainty, on the right geometric structure, matters more than simply adding any Bayesian treatment to adapters.
Paper Structure (30 sections, 1 theorem, 18 equations, 4 figures, 10 tables)

This paper contains 30 sections, 1 theorem, 18 equations, 4 figures, 10 tables.

Key Result

Theorem 1

Let $\hat{U} \in \mathrm{St}(k,d)$ be the MAP estimate of the highly-concentrated target posterior $p_{\mathrm{post}}$. Let $q_{\mathrm{tang}}$ be the intrinsic Laplace approximation, obtained by pushing forward $\mathcal{N}(0, \Sigma_T)$ on $T_{\hat{U}}\mathrm{St}(k,d)$ through a second-order retra where $\mathcal{E}(\Sigma_T, \Sigma_N) > 0$ is a strictly positive penalty governed by the normal v

Figures (4)

  • Figure 1: Overview of Stiefel-Bayes Adapters. The Matrix Langevin prior respects the geometry of orthonormal adapter factors, and the tangent space Laplace approximation enables efficient posterior sampling via retraction.
  • Figure 2: Reliability diagrams on MNLI (in-distribution, top) and MNLI $\to$ SNLI (domain shift, bottom). SBA maintains calibration under shift while other methods degrade.
  • Figure 3: Selective prediction on MNLI $\to$ SNLI. SBA's uncertainty estimates enable more effective abstention at all coverage levels.
  • Figure 4: Uncertainty decomposition across settings. Epistemic uncertainty (mutual information) rises sharply under domain shift and on OOD data, while aleatoric uncertainty remains stable. SBA captures genuine model ignorance that flat-space methods underestimate.

Theorems & Definitions (3)

  • Theorem 1: Geometric Variance Inflation & KL Tightness
  • proof
  • Remark 1: Choice of retraction