Table of Contents
Fetching ...

Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation

Richard D. Paul, Alessio Quercia, Vincent Fortuin, Katharina Nöh, Hanno Scharr

TL;DR

This work investigates the suitability of PEFT methods for subspace Bayesian inference in large-scale Transformer-based vision models and shows that combining BitFit, DiffFit, LoRA, and CoLoRA, a novel LoRA-inspired PEFT method, with Bayesian inference enables more robust and reliable predictive performance in MDE.

Abstract

State-of-the-art computer vision tasks, like monocular depth estimation (MDE), rely heavily on large, modern Transformer-based architectures. However, their application in safety-critical domains demands reliable predictive performance and uncertainty quantification. While Bayesian neural networks provide a conceptually simple approach to serve those requirements, they suffer from the high dimensionality of the parameter space. Parameter-efficient fine-tuning (PEFT) methods, in particular low-rank adaptations (LoRA), have emerged as a popular strategy for adapting large-scale models to down-stream tasks by performing parameter inference on lower-dimensional subspaces. In this work, we investigate the suitability of PEFT methods for subspace Bayesian inference in large-scale Transformer-based vision models. We show that, indeed, combining BitFit, DiffFit, LoRA, and CoLoRA, a novel LoRA-inspired PEFT method, with Bayesian inference enables more robust and reliable predictive performance in MDE.

Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation

TL;DR

This work investigates the suitability of PEFT methods for subspace Bayesian inference in large-scale Transformer-based vision models and shows that combining BitFit, DiffFit, LoRA, and CoLoRA, a novel LoRA-inspired PEFT method, with Bayesian inference enables more robust and reliable predictive performance in MDE.

Abstract

State-of-the-art computer vision tasks, like monocular depth estimation (MDE), rely heavily on large, modern Transformer-based architectures. However, their application in safety-critical domains demands reliable predictive performance and uncertainty quantification. While Bayesian neural networks provide a conceptually simple approach to serve those requirements, they suffer from the high dimensionality of the parameter space. Parameter-efficient fine-tuning (PEFT) methods, in particular low-rank adaptations (LoRA), have emerged as a popular strategy for adapting large-scale models to down-stream tasks by performing parameter inference on lower-dimensional subspaces. In this work, we investigate the suitability of PEFT methods for subspace Bayesian inference in large-scale Transformer-based vision models. We show that, indeed, combining BitFit, DiffFit, LoRA, and CoLoRA, a novel LoRA-inspired PEFT method, with Bayesian inference enables more robust and reliable predictive performance in MDE.
Paper Structure (17 sections, 8 equations, 4 figures)

This paper contains 17 sections, 8 equations, 4 figures.

Figures (4)

  • Figure 1: Negative log-likelihood for all combinations of inference and PEFT methods under consideration, evaluated on the NYU data set. Except SWAG-LR, all methods achieve improved NLL over the deterministic baseline. Error bars indicate 95% intervals across 5 replicate runs. Numbers in the dots indicate the rank parameter used.
  • Figure 2: Test loss per quantile of most certain predictions evaluated on the NYU data set. Except DeepEns, all methods achieve improved test loss on more certain pixels, suggesting good calibration. Uncertainty was estimated using pixelwise standard deviation. For LoRA and CoLoRA, only the results for the rank with lowest test loss on the 5% quantile are depicted. The prediction using the publicly available checkpoint was used as a baseline. Shaded areas indicate 95% intervals across 5 replicate runs.
  • Figure 3: Test loss on 25%, 50%, and 75% quantiles for LoRA and CoLoRA against the rank parameter, evaluated on the NYU data set. No clear trend suggesting the usage of higher ranks can be identified. Shaded areas indicate 95% intervals across 5 replicate runs.
  • Figure 4: Evaluations on KITTI data set.