Table of Contents
Fetching ...

VKFPos: A Learning-Based Monocular Positioning with Variational Bayesian Extended Kalman Filter Integration

Jian-Yu Chen, Yi-Ru Chen, Yin-Qiao Chang, Che-Ming Li, Jann-Long Chern, Chih-Wei Huang

TL;DR

VKFPos tackles monocular 6DoF positioning by fusing learning-based absolute and relative pose estimates within a probabilistic EKF framework under variational Bayesian inference. It decomposes the posterior into APR and RPR components, with each branch predicting pose means and diagonal covariances in $SE(3)$, and uses these covariances to weight training losses and EKF updates. The approach yields competitive single-shot accuracy on indoor and outdoor datasets and surpasses temporal APR and model-based integrations when sequential imagery is available, while maintaining robust covariance estimates for stability. Experimental results on 7-Scenes and Oxford RobotCar demonstrate improved translation and rotation accuracy and highlight VKFPos’s robustness and real-time suitability for autonomous navigation and robotics.

Abstract

This paper addresses the challenges in learning-based monocular positioning by proposing VKFPos, a novel approach that integrates Absolute Pose Regression (APR) and Relative Pose Regression (RPR) via an Extended Kalman Filter (EKF) within a variational Bayesian inference framework. Our method shows that the essential posterior probability of the monocular positioning problem can be decomposed into APR and RPR components. This decomposition is embedded in the deep learning model by predicting covariances in both APR and RPR branches, allowing them to account for associated uncertainties. These covariances enhance the loss functions and facilitate EKF integration. Experimental evaluations on both indoor and outdoor datasets show that the single-shot APR branch achieves accuracy on par with state-of-the-art methods. Furthermore, for temporal positioning, where consecutive images allow for RPR and EKF integration, VKFPos outperforms temporal APR and model-based integration methods, achieving superior accuracy.

VKFPos: A Learning-Based Monocular Positioning with Variational Bayesian Extended Kalman Filter Integration

TL;DR

VKFPos tackles monocular 6DoF positioning by fusing learning-based absolute and relative pose estimates within a probabilistic EKF framework under variational Bayesian inference. It decomposes the posterior into APR and RPR components, with each branch predicting pose means and diagonal covariances in , and uses these covariances to weight training losses and EKF updates. The approach yields competitive single-shot accuracy on indoor and outdoor datasets and surpasses temporal APR and model-based integrations when sequential imagery is available, while maintaining robust covariance estimates for stability. Experimental results on 7-Scenes and Oxford RobotCar demonstrate improved translation and rotation accuracy and highlight VKFPos’s robustness and real-time suitability for autonomous navigation and robotics.

Abstract

This paper addresses the challenges in learning-based monocular positioning by proposing VKFPos, a novel approach that integrates Absolute Pose Regression (APR) and Relative Pose Regression (RPR) via an Extended Kalman Filter (EKF) within a variational Bayesian inference framework. Our method shows that the essential posterior probability of the monocular positioning problem can be decomposed into APR and RPR components. This decomposition is embedded in the deep learning model by predicting covariances in both APR and RPR branches, allowing them to account for associated uncertainties. These covariances enhance the loss functions and facilitate EKF integration. Experimental evaluations on both indoor and outdoor datasets show that the single-shot APR branch achieves accuracy on par with state-of-the-art methods. Furthermore, for temporal positioning, where consecutive images allow for RPR and EKF integration, VKFPos outperforms temporal APR and model-based integration methods, achieving superior accuracy.

Paper Structure

This paper contains 14 sections, 1 theorem, 16 equations, 4 figures, 3 tables.

Key Result

Theorem 1

If $\hat{\mathbf{x}}_{t}$ is only conditioned to the measurement $\mathbf{z}_t$, the posterior distribution of $\hat{\mathbf{x}}_{t}$ given $\hat{\mathbf{x}}_{t-1}$, $\mathbf{u}_{t, t-1}$, and $\mathbf{z}_t$ will be proportional to the product of the measurement likelihood and the transition probabi

Figures (4)

  • Figure 1: The architecture of learning-based monocular positioning with extended Kalman filter integration, VKFPos.
  • Figure 2: The upper branch is the relative pose estimator and the lower one is the absolute pose estimator, the visual encoder is ResNet34 he2016deep
  • Figure 3: This is a visualization of how EKF does in the entire scheme. Absolute poses act as measurements, relative poses act as the control unit, thus the $\hat{\mathbf{x}}_t$ is the final prediction that integrates both information by EKF.
  • Figure 4: Temporal positioning trajectory of MapNet(Upper), AtLot+(Center), VKFPos(Lower) on Oxford RobotCar Dataset. The ground truth is shown in black lines and the red lines are the prediction, while the start represents the starting point.

Theorems & Definitions (2)

  • Theorem 1
  • proof