LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks

Dominik J. Mühlematter; Michelle Halbheer; Alexander Becker; Dominik Narnhofer; Helge Aasen; Konrad Schindler; Mehmet Ozgur Turkoglu

LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks

Dominik J. Mühlematter, Michelle Halbheer, Alexander Becker, Dominik Narnhofer, Helge Aasen, Konrad Schindler, Mehmet Ozgur Turkoglu

TL;DR

This work tackles the challenge of producing well-calibrated uncertainty estimates in large transformer models without the prohibitive cost of training and deploying full ensembles. It introduces LoRA-Ensemble, a parameter-efficient implicit ensemble that freezes the pre-trained backbone and attaches low-rank updates to the attention projections, with each ensemble member defined by its own $\Delta W_i = B_i A_i$. By averaging predictions across $N$ members and computing the ensemble variance, the method achieves accuracy and calibration that often surpass explicit ensembles and other implicit baselines, while dramatically reducing parameters and memory. Extensive experiments across CIFAR-100, HAM10000, iNaturalist, ESC-50, and SST-2 demonstrate strong predictive performance and superior calibration, with enhanced diversity in both function and weight spaces. The approach scales to large, fine-grained tasks and even transfers to CNNs, offering a practical and scalable route toward reliable uncertainty estimation in modern AI systems, with potential energy and environmental benefits. $W = W_0 + \Delta W = W_0 + B A$ and $h_i = W_0\cdot x + B_i A_i x$ are central to the method’s formulation and its empirical success.

Abstract

Numerous real-world decisions rely on machine learning algorithms and require calibrated uncertainty estimates. However, modern methods often yield overconfident, uncalibrated predictions. The dominant approach to quantifying the uncertainty inherent in the model is to train an ensemble of separate predictors and measure their empirical variance. In an explicit implementation, the ensemble has high computational cost and memory footprint, especially if the base model itself is already large, like modern transformers. This motivates efforts to develop implicit ensemble methods that emulate the ensemble without explicitly instantiating all its members. We introduce LoRA-Ensemble, a parameter-efficient ensembling method for self-attention networks. It is based on Low-Rank Adaptation (LoRA), originally developed for efficient LLM fine-tuning, and extends it into an implicit ensembling scheme, where all ensemble members share the same, pre-trained self-attention network, but have individual low-rank matrices for the attention projections. The resulting method not only outperforms state-of-the-art implicit techniques like BatchEnsemble, but even matches or exceeds the accuracy of an Explicit Ensemble, while at the same time achieving superior calibration.

LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks

TL;DR

. By averaging predictions across

members and computing the ensemble variance, the method achieves accuracy and calibration that often surpass explicit ensembles and other implicit baselines, while dramatically reducing parameters and memory. Extensive experiments across CIFAR-100, HAM10000, iNaturalist, ESC-50, and SST-2 demonstrate strong predictive performance and superior calibration, with enhanced diversity in both function and weight spaces. The approach scales to large, fine-grained tasks and even transfers to CNNs, offering a practical and scalable route toward reliable uncertainty estimation in modern AI systems, with potential energy and environmental benefits.

and

are central to the method’s formulation and its empirical success.

Abstract

Paper Structure (53 sections, 28 equations, 16 figures, 20 tables)

This paper contains 53 sections, 28 equations, 16 figures, 20 tables.

Introduction
LoRA-Ensemble
Experiments
Computational Cost
CIFAR-100
HAM10000 Lesion Classification
Large-Scale Fine-Grained Image Classification with iNaturalist
Out-of-Distribution (OOD) Detection
Enhanced Diversity in LoRA-Ensemble
Discussion
Related Work
Estimation of Epistemic Uncertainty.
Ensembles and Implicit Ensembling.
Low-Rank Adaptation in Transformer Networks.
Conclusion
...and 38 more sections

Figures (16)

Figure 1: A schema of a lora-Ensemble. The computation structure of the multi-head self-attention module (right), and lora-Ensemble module (bottom left). $X$ denotes the actual input, and $x$ represents the intermediate input representation.
Figure 2: Function space analysis of lora-Ensemble vs. Explicit Ensemble.
Figure 3: Weight space analysis of lora-Ensemble vs. Explicit Ensemble.
Figure 4: Accuracy and ece on CIFAR-100, with different ensemble sizes.
Figure 5: Reliability diagrams for Explicit Ensemble (left) and lora-Ensemble (right) with 16 members, on CIFAR-100.
...and 11 more figures

LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks

TL;DR

Abstract

LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (16)