Table of Contents
Fetching ...

PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments

Haoang Li, Xiangqi Meng, Xingxing Zuo, Zhe Liu, Hesheng Wang, Daniel Cremers

TL;DR

This work proposes a photorealistic and geometry-aware red-green-blue-depth (RGB-D) SLAM method based on Gaussian splatting that outperforms state-of-the-art approaches in terms of camera localization and scene mapping.

Abstract

Simultaneous localization and mapping (SLAM) has achieved impressive performance in static environments. However, SLAM in dynamic environments remains an open question. Many methods directly filter out dynamic objects, resulting in incomplete scene reconstruction and limited accuracy of camera localization. The other works express dynamic objects by point clouds, sparse joints, or coarse meshes, which fails to provide a photo-realistic representation. To overcome the above limitations, we propose a photo-realistic and geometry-aware RGB-D SLAM method by extending Gaussian splatting. Our method is composed of three main modules to 1) map the dynamic foreground including non-rigid humans and rigid items, 2) reconstruct the static background, and 3) localize the camera. To map the foreground, we focus on modeling the deformations and/or motions. We consider the shape priors of humans and exploit geometric and appearance constraints of humans and items. For background mapping, we design an optimization strategy between neighboring local maps by integrating appearance constraint into geometric alignment. As to camera localization, we leverage both static background and dynamic foreground to increase the observations for noise compensation. We explore the geometric and appearance constraints by associating 3D Gaussians with 2D optical flows and pixel patches. Experiments on various real-world datasets demonstrate that our method outperforms state-of-the-art approaches in terms of camera localization and scene representation. Source codes will be publicly available upon paper acceptance.

PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments

TL;DR

This work proposes a photorealistic and geometry-aware red-green-blue-depth (RGB-D) SLAM method based on Gaussian splatting that outperforms state-of-the-art approaches in terms of camera localization and scene mapping.

Abstract

Simultaneous localization and mapping (SLAM) has achieved impressive performance in static environments. However, SLAM in dynamic environments remains an open question. Many methods directly filter out dynamic objects, resulting in incomplete scene reconstruction and limited accuracy of camera localization. The other works express dynamic objects by point clouds, sparse joints, or coarse meshes, which fails to provide a photo-realistic representation. To overcome the above limitations, we propose a photo-realistic and geometry-aware RGB-D SLAM method by extending Gaussian splatting. Our method is composed of three main modules to 1) map the dynamic foreground including non-rigid humans and rigid items, 2) reconstruct the static background, and 3) localize the camera. To map the foreground, we focus on modeling the deformations and/or motions. We consider the shape priors of humans and exploit geometric and appearance constraints of humans and items. For background mapping, we design an optimization strategy between neighboring local maps by integrating appearance constraint into geometric alignment. As to camera localization, we leverage both static background and dynamic foreground to increase the observations for noise compensation. We explore the geometric and appearance constraints by associating 3D Gaussians with 2D optical flows and pixel patches. Experiments on various real-world datasets demonstrate that our method outperforms state-of-the-art approaches in terms of camera localization and scene representation. Source codes will be publicly available upon paper acceptance.

Paper Structure

This paper contains 46 sections, 17 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Overview of our SLAM method. Given sequential RGB-D images obtained in a dynamic environment, our method can not only reconstruct the static background and localize the camera, but also map the dynamic foreground. By optimizing Gaussians based on appearance and geometric constraints, our method can provide photo-realistic scene representation and accurate camera localization.
  • Figure 2: Initialization of human Gaussians. Given the first RGB-D images, we generate an SMPL mesh up to scale, followed by transforming it to the camera frame based on the estimated transformation of root joint. We further determine the real scale of SMPL mesh based on the depth of root joint. Then we attach a set of Gaussians to the re-scaled SMPL mesh, and optimize these Gaussians based on the appearance constraint.
  • Figure 3: Update of human Gaussians. In the SMPL frame, we use a neural network $\mathcal{D}$ to deform Gaussians at time $t$ into new Gaussians at time $t+1$. Then we transform the deformed Gaussians to the $(t+1)$-th camera frame using the transformation associated with the root joint. Finally, we optimize these Gaussians and the network $\mathcal{D}$ based on the appearance constraint.
  • Figure 4: Rigid transformation and addition of item Gaussians. We first estimate the optical flow and back-project depth images to establish 3D-3D point correspondences. Then we use these correspondences to roughly estimate the transformation, followed by optimizing Gaussians and transformation based on appearance constraint. Finally, we estimate the new observation mask that guides the addition of Gaussians using appearance constraint.
  • Figure 5: Optimization between $n$-th and $(n+1)$-th local maps. Here, we show the centers of Gaussians. We iteratively align Gaussians $\mathcal{G}_{n+1}$ to Gaussians $\mathcal{G}_{n}$ based on geometric constraint. To improve the robustness of optimization, we integrate the appearance constraint into each iteration. Gaussians $\mathcal{G}_{n+1}$ are rendered by multiple cameras associated with Gaussians $\mathcal{G}_{n}$.
  • ...and 8 more figures