Table of Contents
Fetching ...

Optimizing NeRF-based SLAM with Trajectory Smoothness Constraints

Yicheng He, Guangcheng Chen, Hong Zhang

TL;DR

TS-SLAM (TS for Trajectory Smoothness) introduces smoothness constraints on camera trajectories by representing them with uniform cubic B-splines with continuous acceleration that guarantees smooth camera motion.

Abstract

The joint optimization of Neural Radiance Fields (NeRF) and camera trajectories has been widely applied in SLAM tasks due to its superior dense mapping quality and consistency. NeRF-based SLAM learns camera poses using constraints by implicit map representation. A widely observed phenomenon that results from the constraints of this form is jerky and physically unrealistic estimated camera motion, which in turn affects the map quality. To address this deficiency of current NeRF-based SLAM, we propose in this paper TS-SLAM (TS for Trajectory Smoothness). It introduces smoothness constraints on camera trajectories by representing them with uniform cubic B-splines with continuous acceleration that guarantees smooth camera motion. Benefiting from the differentiability and local control properties of B-splines, TS-SLAM can incrementally learn the control points end-to-end using a sliding window paradigm. Additionally, we regularize camera trajectories by exploiting the dynamics prior to further smooth trajectories. Experimental results demonstrate that TS-SLAM achieves superior trajectory accuracy and improves mapping quality versus NeRF-based SLAM that does not employ the above smoothness constraints.

Optimizing NeRF-based SLAM with Trajectory Smoothness Constraints

TL;DR

TS-SLAM (TS for Trajectory Smoothness) introduces smoothness constraints on camera trajectories by representing them with uniform cubic B-splines with continuous acceleration that guarantees smooth camera motion.

Abstract

The joint optimization of Neural Radiance Fields (NeRF) and camera trajectories has been widely applied in SLAM tasks due to its superior dense mapping quality and consistency. NeRF-based SLAM learns camera poses using constraints by implicit map representation. A widely observed phenomenon that results from the constraints of this form is jerky and physically unrealistic estimated camera motion, which in turn affects the map quality. To address this deficiency of current NeRF-based SLAM, we propose in this paper TS-SLAM (TS for Trajectory Smoothness). It introduces smoothness constraints on camera trajectories by representing them with uniform cubic B-splines with continuous acceleration that guarantees smooth camera motion. Benefiting from the differentiability and local control properties of B-splines, TS-SLAM can incrementally learn the control points end-to-end using a sliding window paradigm. Additionally, we regularize camera trajectories by exploiting the dynamics prior to further smooth trajectories. Experimental results demonstrate that TS-SLAM achieves superior trajectory accuracy and improves mapping quality versus NeRF-based SLAM that does not employ the above smoothness constraints.

Paper Structure

This paper contains 27 sections, 10 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Camera pose constraint schemes for (a) traditional SLAM, (b) coupled NeRF-SLAM, and (c) our method. The estimated trajectory (red line) of current coupled NeRF-SLAM is jerky due to indirectly constrained optimization. Our method improves trajectory accuracy and enhances reconstruction quality by introducing smoothness constraints derived from the B-spline representation of the camera trajectory.
  • Figure 2: TS-SLAM system pipeline. TS-SLAM system consists of two parallel threads: tracking and mapping. The mapping thread includes local and global Bundle Adjustment (BA) to optimize control points and the map.
  • Figure 3: End-to-end learning of control points. Four control points influence the pose at a certain moment on the curve, indirectly introducing smoothness constraints among camera poses that are temporally close. The control points are learned end-to-end by minimizing $\mathcal{L}_{NeRF}$ and $\mathcal{L}_{DR}$.
  • Figure 4: Local bundle adjustment. The sliding window contains the latest $M$ control points and the RGBD observations within the interval $[t_{i+1-M}, t_{i+1})$. The red-framed square represents the newly added control point that needs to be initialized. The discrete poses are the camera poses output by the tracking thread.
  • Figure 5: Qualitative results of TS-SLAM on the ScanNet dataset. The ground truth trajectory is shown in black, and the estimated trajectory is shown in red. Our method achieves more accurate camera tracking results than the baselines and improves mapping quality (right column).
  • ...and 1 more figures