Table of Contents
Fetching ...

Camera Motion Estimation from RGB-D-Inertial Scene Flow

Samuel Cerezo, Javier Civera

TL;DR

This work introduces a tightly coupled RGB-D–inertial scene flow framework for camera motion estimation in rigid environments, leveraging pre-integrated IMU residuals and depth-based velocity constraints within a sliding-window optimization. By jointly minimizing visual and inertial residuals and employing marginalization to retain information from past frames, the method achieves higher accuracy and robustness than RGB-D-only approaches, as demonstrated on synthetic ICL-NUIM and real OpenLORIS-Scene data. The key contributions are the integration of inertial data into a dense RGB-D flow odometry formulation, the use of gravity direction on $S^2$ for stable state representation, and a practical marginalization strategy that preserves past information while keeping the optimization tractable. Overall, the approach provides improved camera motion estimates and IMU state tracking, with potential benefits for indoor robotics and AR applications where multi-sensor fusion enhances reliability.

Abstract

In this paper, we introduce a novel formulation for camera motion estimation that integrates RGB-D images and inertial data through scene flow. Our goal is to accurately estimate the camera motion in a rigid 3D environment, along with the state of the inertial measurement unit (IMU). Our proposed method offers the flexibility to operate as a multi-frame optimization or to marginalize older data, thus effectively utilizing past measurements. To assess the performance of our method, we conducted evaluations using both synthetic data from the ICL-NUIM dataset and real data sequences from the OpenLORIS-Scene dataset. Our results show that the fusion of these two sensors enhances the accuracy of camera motion estimation when compared to using only visual data.

Camera Motion Estimation from RGB-D-Inertial Scene Flow

TL;DR

This work introduces a tightly coupled RGB-D–inertial scene flow framework for camera motion estimation in rigid environments, leveraging pre-integrated IMU residuals and depth-based velocity constraints within a sliding-window optimization. By jointly minimizing visual and inertial residuals and employing marginalization to retain information from past frames, the method achieves higher accuracy and robustness than RGB-D-only approaches, as demonstrated on synthetic ICL-NUIM and real OpenLORIS-Scene data. The key contributions are the integration of inertial data into a dense RGB-D flow odometry formulation, the use of gravity direction on for stable state representation, and a practical marginalization strategy that preserves past information while keeping the optimization tractable. Overall, the approach provides improved camera motion estimates and IMU state tracking, with potential benefits for indoor robotics and AR applications where multi-sensor fusion enhances reliability.

Abstract

In this paper, we introduce a novel formulation for camera motion estimation that integrates RGB-D images and inertial data through scene flow. Our goal is to accurately estimate the camera motion in a rigid 3D environment, along with the state of the inertial measurement unit (IMU). Our proposed method offers the flexibility to operate as a multi-frame optimization or to marginalize older data, thus effectively utilizing past measurements. To assess the performance of our method, we conducted evaluations using both synthetic data from the ICL-NUIM dataset and real data sequences from the OpenLORIS-Scene dataset. Our results show that the fusion of these two sensors enhances the accuracy of camera motion estimation when compared to using only visual data.
Paper Structure (17 sections, 26 equations, 6 figures, 2 tables)

This paper contains 17 sections, 26 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Illustration of the temporal notation for RGB-D images, IMU measurements and marginalization and optimization windows.
  • Figure 2: Optimization using a sliding window over N frames while the camera is moving over the trajectory.
  • Figure 3: Factor graph representation using different modes of operation. Blue and green shapes contains the variables to be estimated. (a) Taking two frames, only one visual and one inertial residual are used. (b) Here we take three frames, so there are two visual residual. Only one inertial residual is used. (c) Is the same situation as before but in this case two inertial residual are used. The difference between (b) and (c) is the inertial constraint imposed by the last aggregate frame.
  • Figure 4: Factor graph using sliding window with containing 3 frames. Blue and green shapes contains the variables to be estimated. (a) When a new frame comes, both visual and inertial residual is added and the marginalization is done. (b) After marginalization, a new prior residual is added on the cost function.
  • Figure 5: Marginalization example. We start with a Hessian matrix $\mathbf{H}$ after optimization with $N=4$. We want to marginalize $\mathbf{v}_i$ and $\boldsymbol{\omega}_i$. The marginalized Hessian matrix $\mathbf{H}^*$ corresponds to the Schur complement of $\mathbf{H}_{\alpha\alpha}$. This calculation transfers the information constraints of the variable being eliminated to its adjacent nodes, adding shared information between these variables (green cells).
  • ...and 1 more figures