Table of Contents
Fetching ...

Conditional Variational Autoencoders for Probabilistic Pose Regression

Fereidoon Zangeneh, Leonard Bruns, Amit Dekel, Alessandro Pieropan, Patric Jensfelt

TL;DR

This work proposes a probabilistic method to predict the posterior distribution of camera poses given an observed image and results in a generative model of camera poses given an image, which can be used to draw samples from the pose posterior distribution.

Abstract

Robots rely on visual relocalization to estimate their pose from camera images when they lose track. One of the challenges in visual relocalization is repetitive structures in the operation environment of the robot. This calls for probabilistic methods that support multiple hypotheses for robot's pose. We propose such a probabilistic method to predict the posterior distribution of camera poses given an observed image. Our proposed training strategy results in a generative model of camera poses given an image, which can be used to draw samples from the pose posterior distribution. Our method is streamlined and well-founded in theory and outperforms existing methods on localization in presence of ambiguities.

Conditional Variational Autoencoders for Probabilistic Pose Regression

TL;DR

This work proposes a probabilistic method to predict the posterior distribution of camera poses given an observed image and results in a generative model of camera poses given an image, which can be used to draw samples from the pose posterior distribution.

Abstract

Robots rely on visual relocalization to estimate their pose from camera images when they lose track. One of the challenges in visual relocalization is repetitive structures in the operation environment of the robot. This calls for probabilistic methods that support multiple hypotheses for robot's pose. We propose such a probabilistic method to predict the posterior distribution of camera poses given an observed image. Our proposed training strategy results in a generative model of camera poses given an image, which can be used to draw samples from the pose posterior distribution. Our method is streamlined and well-founded in theory and outperforms existing methods on localization in presence of ambiguities.
Paper Structure (27 sections, 2 equations, 4 figures, 2 tables)

This paper contains 27 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1:
  • Figure 2: (a) Our pose generative model is trained as the decoder in a conditional variational autoencoder pipeline reconstructing the ground-truth pose $y \in \mathrm{SE}(3)$ for an image $\boldsymbol{x} \in \mathbb{R}^{H \times W \times 3}$. The loss terms used in the learning objective are shown in orange. During training the latent posterior only partly overlaps with the latent prior, resulting in generated pose samples concentrated at the ground-truth pose. (b) At inference time latent samples are drawn from the prior distribution and mapped to distinct modes in $\mathrm{SE}(3)$. In the 3D rendering of the scene we can see that for the query image viewing an ambiguous landing at the staircase, output pose samples are concentrated at three modes looking at different, but visually similar landings, including the ground truth. Pose samples are shown by teal and the ground truth by orange camera frusta.
  • Figure 3:
  • Figure 4: