Table of Contents
Fetching ...

Quantifying Epistemic Uncertainty in Absolute Pose Regression

Fereidoon Zangeneh, Amit Dekel, Alessandro Pieropan, Patric Jensfelt

TL;DR

This paper tackles the problem that absolute pose regression for visual relocalization often lacks reliable, interpretable confidence measures, especially when test data falls outside the training distribution. It introduces a conditional variational autoencoder to model the conditional pose distribution $p(y|\mathbf{x})$, enabling sampling of multiple plausible poses and estimation of the likelihood of observations to quantify epistemic uncertainty. By framing uncertainty in terms of likelihood and leveraging importance sampling within a CVAE, the approach unifies handling epistemic and aleatoric uncertainty, including ambiguous observations due to repetitive structures. Empirical results across indoor and outdoor datasets show stronger correlation between estimated uncertainty and prediction error than prior methods, demonstrating improved reliability in challenging, out-of-distribution, or ambiguous scenarios with practical implications for robust visual relocalization.

Abstract

Visual relocalization is the task of estimating the camera pose given an image it views. Absolute pose regression offers a solution to this task by training a neural network, directly regressing the camera pose from image features. While an attractive solution in terms of memory and compute efficiency, absolute pose regression's predictions are inaccurate and unreliable outside the training domain. In this work, we propose a novel method for quantifying the epistemic uncertainty of an absolute pose regression model by estimating the likelihood of observations within a variational framework. Beyond providing a measure of confidence in predictions, our approach offers a unified model that also handles observation ambiguities, probabilistically localizing the camera in the presence of repetitive structures. Our method outperforms existing approaches in capturing the relation between uncertainty and prediction error.

Quantifying Epistemic Uncertainty in Absolute Pose Regression

TL;DR

This paper tackles the problem that absolute pose regression for visual relocalization often lacks reliable, interpretable confidence measures, especially when test data falls outside the training distribution. It introduces a conditional variational autoencoder to model the conditional pose distribution , enabling sampling of multiple plausible poses and estimation of the likelihood of observations to quantify epistemic uncertainty. By framing uncertainty in terms of likelihood and leveraging importance sampling within a CVAE, the approach unifies handling epistemic and aleatoric uncertainty, including ambiguous observations due to repetitive structures. Empirical results across indoor and outdoor datasets show stronger correlation between estimated uncertainty and prediction error than prior methods, demonstrating improved reliability in challenging, out-of-distribution, or ambiguous scenarios with practical implications for robust visual relocalization.

Abstract

Visual relocalization is the task of estimating the camera pose given an image it views. Absolute pose regression offers a solution to this task by training a neural network, directly regressing the camera pose from image features. While an attractive solution in terms of memory and compute efficiency, absolute pose regression's predictions are inaccurate and unreliable outside the training domain. In this work, we propose a novel method for quantifying the epistemic uncertainty of an absolute pose regression model by estimating the likelihood of observations within a variational framework. Beyond providing a measure of confidence in predictions, our approach offers a unified model that also handles observation ambiguities, probabilistically localizing the camera in the presence of repetitive structures. Our method outperforms existing approaches in capturing the relation between uncertainty and prediction error.

Paper Structure

This paper contains 21 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Querying any absolute pose regression network on a trajectory and data domain different from training data results in high prediction errors (depicted by long lines connecting predictions and ground truth). Our proposed epistemic uncertainty quantification approach estimates the likelihood of test samples belonging to the training distribution (visualized by a color map), offering a guide for trusting the predictions. We see that the color of the predictions and their ground truths, which encodes the uncertainty, is highly correlated with the prediction error (the length of the lines).
  • Figure 2: Our pipeline models a scene in a conditional VAE setup. Given a test-time image observation $\boldsymbol{x}$, the decoder is used to sample poses $\hat{y}$ from the posterior distribution of camera poses $p(y \mid \boldsymbol{x})$. Reconstruction of $\hat{y}$ through the VAE pipeline then gives an estimate on $\log p(\hat{y} \mid \boldsymbol{x})$, that is the likelihood of the test sample under the training distribution. This reflects model's epistemic uncertainty about the observation $\boldsymbol{x}$.
  • Figure 3: Testing sequence kth_day_09 on models trained on kth_day_06 from nguyen2024mcd. The lines depict the translation error between the predictions ($\boldsymbol{\circ}$) and ground truth ($\bullet$), which are colored from cyan to magenta by the epistemic uncertainty. The length of the lines is expected to correlate with epistemic uncertainty, as is the case in our predictions, that is cyan for short lines (small error) to magenta for long lines (large error).
  • Figure 4: The relationship between quantified epistemic uncertainty and prediction error across methods for our test sequence from Multi-Campus Dataset. The first two columns present scatter plots of all test samples, while the last two columns focus on samples with a translation error of less than $50m$. Since Spearman's rank correlation is invariant to affine changes, we scale and shift the uncertainty distribution in each plot so that their 10th and 90th percentiles align with those of the error distributions for better visualization, hence removing $y$-axis ticks. Our proposed approach demonstrates higher correlations between uncertainty and error, both with and without sample filtering.