Table of Contents
Fetching ...

MPVO: Motion-Prior based Visual Odometry for PointGoal Navigation

Sayan Paul, Ruddra dev Roychoudhury, Brojeshwar Bhowmick

TL;DR

This work proposes a robust and sample-efficient VO pipeline based on motion priors available while an agent is navigating an environment that demonstrates superior accuracy and robustness in point-goal navigation tasks compared to state-of-the-art VO method(s).

Abstract

Visual odometry (VO) is essential for enabling accurate point-goal navigation of embodied agents in indoor environments where GPS and compass sensors are unreliable and inaccurate. However, traditional VO methods face challenges in wide-baseline scenarios, where fast robot motions and low frames per second (FPS) during inference hinder their performance, leading to drift and catastrophic failures in point-goal navigation. Recent deep-learned VO methods show robust performance but suffer from sample inefficiency during training; hence, they require huge datasets and compute resources. So, we propose a robust and sample-efficient VO pipeline based on motion priors available while an agent is navigating an environment. It consists of a training-free action-prior based geometric VO module that estimates a coarse relative pose which is further consumed as a motion prior by a deep-learned VO model, which finally produces a fine relative pose to be used by the navigation policy. This strategy helps our pipeline achieve up to 2x sample efficiency during training and demonstrates superior accuracy and robustness in point-goal navigation tasks compared to state-of-the-art VO method(s). Realistic indoor environments of the Gibson dataset is used in the AI-Habitat simulator to evaluate the proposed approach using navigation metrics (like success/SPL) and pose metrics (like RPE/ATE). We hope this method further opens a direction of work where motion priors from various sources can be utilized to improve VO estimates and achieve better results in embodied navigation tasks.

MPVO: Motion-Prior based Visual Odometry for PointGoal Navigation

TL;DR

This work proposes a robust and sample-efficient VO pipeline based on motion priors available while an agent is navigating an environment that demonstrates superior accuracy and robustness in point-goal navigation tasks compared to state-of-the-art VO method(s).

Abstract

Visual odometry (VO) is essential for enabling accurate point-goal navigation of embodied agents in indoor environments where GPS and compass sensors are unreliable and inaccurate. However, traditional VO methods face challenges in wide-baseline scenarios, where fast robot motions and low frames per second (FPS) during inference hinder their performance, leading to drift and catastrophic failures in point-goal navigation. Recent deep-learned VO methods show robust performance but suffer from sample inefficiency during training; hence, they require huge datasets and compute resources. So, we propose a robust and sample-efficient VO pipeline based on motion priors available while an agent is navigating an environment. It consists of a training-free action-prior based geometric VO module that estimates a coarse relative pose which is further consumed as a motion prior by a deep-learned VO model, which finally produces a fine relative pose to be used by the navigation policy. This strategy helps our pipeline achieve up to 2x sample efficiency during training and demonstrates superior accuracy and robustness in point-goal navigation tasks compared to state-of-the-art VO method(s). Realistic indoor environments of the Gibson dataset is used in the AI-Habitat simulator to evaluate the proposed approach using navigation metrics (like success/SPL) and pose metrics (like RPE/ATE). We hope this method further opens a direction of work where motion priors from various sources can be utilized to improve VO estimates and achieve better results in embodied navigation tasks.

Paper Structure

This paper contains 18 sections, 1 equation, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Point-Nav Task: The agent must navigate from its initial location (blue square) to a goal location (red square) specified as goal coordinates w.r.t. its initial location, using only its noisy RGB-D observations and noisy actuation. The agent's and the oracle's path is shown as blue and green lines respectively on the top-down map.
  • Figure 2: Overall Pipeline of the PointNav Agent: The agent observes $O_t$ upon executing action $a_{t-1}$. The current and previous timestep's observations ($O_t$,$O_{t-1}$) and $a_{t-1}$ are fed into the VO method which outputs the current agent pose after integrating the relative pose estimates till time t. The current observation $O_t$ and the updated goal location w.r.t the current agent pose, is provided to the nav-policy which determines the next action $a_t$.
  • Figure 3: Geometric Coarse Pose Estimation (GCPE) module: this takes an RGB-D pair and action prior pose as input and estimates a coarse pose as output.
  • Figure 4: Neural Fine Pose Regression (NFPR) module: this takes an RGB-D pair, action prior pose, coarse prior pose from GCPE module as input and regresses the fine relative pose as output.