Table of Contents
Fetching ...

An Online Adaptation Method for Robust Depth Estimation and Visual Odometry in the Open World

Xingwu Ji, Haochen Niu, Dexin Duan, Rendong Ying, Fei Wen, Peilin Liu

TL;DR

The paper tackles the generalization challenge of learning-based monocular depth and visual odometry in open-world settings by introducing an online self-supervised adaptation framework. It couples a pre-trained depth model equipped with lightweight refiners (R-DepthNet) with a pseudo RGB-D SLAM in a closed loop, enhanced by Sparse Depth Densification and Dynamic Consistency Enhancement to generate pseudo-depths and valid masks during online updates. The approach achieves robust depth and pose estimation across KITTI, TUM, and a mobile robot platform, with online adaptation requiring only a small fraction of trainable parameters and demonstrating fast convergence. This work enables practical deployment of learning-based VO systems in diverse environments by leveraging real-time feedback from SLAM to adapt depth estimation on the fly.

Abstract

Recently, learning-based robotic navigation systems have gained extensive research attention and made significant progress. However, the diversity of open-world scenarios poses a major challenge for the generalization of such systems to practical scenarios. Specifically, learned systems for scene measurement and state estimation tend to degrade when the application scenarios deviate from the training data, resulting to unreliable depth and pose estimation. Toward addressing this problem, this work aims to develop a visual odometry system that can fast adapt to diverse novel environments in an online manner. To this end, we construct a self-supervised online adaptation framework for monocular visual odometry aided by an online-updated depth estimation module. Firstly, we design a monocular depth estimation network with lightweight refiner modules, which enables efficient online adaptation. Then, we construct an objective for self-supervised learning of the depth estimation module based on the output of the visual odometry system and the contextual semantic information of the scene. Specifically, a sparse depth densification module and a dynamic consistency enhancement module are proposed to leverage camera poses and contextual semantics to generate pseudo-depths and valid masks for the online adaptation. Finally, we demonstrate the robustness and generalization capability of the proposed method in comparison with state-of-the-art learning-based approaches on urban, in-house datasets and a robot platform. Code is publicly available at: https://github.com/jixingwu/SOL-SLAM.

An Online Adaptation Method for Robust Depth Estimation and Visual Odometry in the Open World

TL;DR

The paper tackles the generalization challenge of learning-based monocular depth and visual odometry in open-world settings by introducing an online self-supervised adaptation framework. It couples a pre-trained depth model equipped with lightweight refiners (R-DepthNet) with a pseudo RGB-D SLAM in a closed loop, enhanced by Sparse Depth Densification and Dynamic Consistency Enhancement to generate pseudo-depths and valid masks during online updates. The approach achieves robust depth and pose estimation across KITTI, TUM, and a mobile robot platform, with online adaptation requiring only a small fraction of trainable parameters and demonstrating fast convergence. This work enables practical deployment of learning-based VO systems in diverse environments by leveraging real-time feedback from SLAM to adapt depth estimation on the fly.

Abstract

Recently, learning-based robotic navigation systems have gained extensive research attention and made significant progress. However, the diversity of open-world scenarios poses a major challenge for the generalization of such systems to practical scenarios. Specifically, learned systems for scene measurement and state estimation tend to degrade when the application scenarios deviate from the training data, resulting to unreliable depth and pose estimation. Toward addressing this problem, this work aims to develop a visual odometry system that can fast adapt to diverse novel environments in an online manner. To this end, we construct a self-supervised online adaptation framework for monocular visual odometry aided by an online-updated depth estimation module. Firstly, we design a monocular depth estimation network with lightweight refiner modules, which enables efficient online adaptation. Then, we construct an objective for self-supervised learning of the depth estimation module based on the output of the visual odometry system and the contextual semantic information of the scene. Specifically, a sparse depth densification module and a dynamic consistency enhancement module are proposed to leverage camera poses and contextual semantics to generate pseudo-depths and valid masks for the online adaptation. Finally, we demonstrate the robustness and generalization capability of the proposed method in comparison with state-of-the-art learning-based approaches on urban, in-house datasets and a robot platform. Code is publicly available at: https://github.com/jixingwu/SOL-SLAM.

Paper Structure

This paper contains 21 sections, 15 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: An example of our proposed method on TUM dataset.
  • Figure 2: Overview of our online adaptation framework. In this system, the depth estimation module R-DepthNet and the pseudo RGB-D SLAM reinforce each other in an online manner, and output the refined depth maps and camera poses, respectively. $\mathcal{S}$: object regions, $M$: consistency masks and $D_d$: densified depths.
  • Figure 3: Extracted and matched feature points.
  • Figure 4: Illustration of the sparse depth densification.
  • Figure 5: Illustration of two masking processes. Top to Bottom: segmented images, self-discovered masks $W_s$bian2021unsupervised, semantic masks $M_{sc}$, and final consistency masks $M$. In $M_{sc}$, the white regions indicate dynamic regions, while the dark regions indicate the regions that conform to the conditions of Eqn. \ref{['equ:M_sc']}.
  • ...and 9 more figures