Table of Contents
Fetching ...

BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting

Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang

TL;DR

This work proposes Bayesian Embodied Image-goal Navigation Using Gaussian Splatting, a novel method that formulates ImageNav as an optimal control problem within a model predictive control framework, and leverages 3D Gaussian Splatting as a scene prior to predict future observations.

Abstract

Image-goal navigation enables a robot to reach the location where a target image was captured, using visual cues for guidance. However, current methods either rely heavily on data and computationally expensive learning-based approaches or lack efficiency in complex environments due to insufficient exploration strategies. To address these limitations, we propose Bayesian Embodied Image-goal Navigation Using Gaussian Splatting, a novel method that formulates ImageNav as an optimal control problem within a model predictive control framework. BEINGS leverages 3D Gaussian Splatting as a scene prior to predict future observations, enabling efficient, real-time navigation decisions grounded in the robot's sensory experiences. By integrating Bayesian updates, our method dynamically refines the robot's strategy without requiring extensive prior experience or data. Our algorithm is validated through extensive simulations and physical experiments, showcasing its potential for embodied robot systems in visually complex scenarios.

BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting

TL;DR

This work proposes Bayesian Embodied Image-goal Navigation Using Gaussian Splatting, a novel method that formulates ImageNav as an optimal control problem within a model predictive control framework, and leverages 3D Gaussian Splatting as a scene prior to predict future observations.

Abstract

Image-goal navigation enables a robot to reach the location where a target image was captured, using visual cues for guidance. However, current methods either rely heavily on data and computationally expensive learning-based approaches or lack efficiency in complex environments due to insufficient exploration strategies. To address these limitations, we propose Bayesian Embodied Image-goal Navigation Using Gaussian Splatting, a novel method that formulates ImageNav as an optimal control problem within a model predictive control framework. BEINGS leverages 3D Gaussian Splatting as a scene prior to predict future observations, enabling efficient, real-time navigation decisions grounded in the robot's sensory experiences. By integrating Bayesian updates, our method dynamically refines the robot's strategy without requiring extensive prior experience or data. Our algorithm is validated through extensive simulations and physical experiments, showcasing its potential for embodied robot systems in visually complex scenarios.
Paper Structure (18 sections, 16 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 16 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: A schematic diagram of BEINGS. The bird’s eye view shows a 3DGS map that the robot uses to navigate toward a target image. The robot estimates the target’s location using Bayesian principles, based on the similarity between its current observation (top right) and the image goal (bottom right). The yellow, blue, and green dotted lines show predicted rollouts, with images in corresponding color blocks showing potential future observations rendered by 3DGS (left lower). The orange dash-dotted line represents the optimal rollout selected for navigation.
  • Figure 2: System overview of BEINGS for image-goal navigation. When the robot acquires a new image as a measurement, BEINGS updates its estimate of the target image location's distribution $\pi$ by utilizing the image similarity between the measurement and the image goal, adhering to Bayesian principles. Subsequently, it executes Monte Carlo-based MPC by sampling $N$ control sequences from the current control distribution. Using the robot's motion model and the 3DGS map, it generates $N$ image sequences of length $T$. Each image sequence is scored, and the control distribution is resampled based on these scores, incrementally approximating the optimal control distribution to guide the robot toward the target image.
  • Figure 3: Renderable radiance field map using Gaussian splatting. Given arbitrary camera pose, 3DGS can render an image that closely resembles the real image that captured at the given pose.
  • Figure 4: ImageNav process using BEINGS. The process shows the Monte Carlo-base MPC process (Top) and the probability of the target allocated in each $\mathbb{S}_i$ is changing with process-aware Bayesian update (Bottom).
  • Figure 5: Miniature blimp robot. In this study, the body frame is set as the camera frame, and control commands are applied to the body frame for navigation.
  • ...and 3 more figures