Table of Contents
Fetching ...

Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image

Jerred Chen, Ronald Clark

TL;DR

This work treats motion blur not as a nuisance but as a rich cue for estimating fast camera motion from a single blurred image. It introduces a two-stage pipeline that first predicts dense optical flow and monocular depth from the blurred frame, then solves a differentiable linear least-squares problem to recover instantaneous 6-DoF velocity, producing IMU-like measurements in real time. Trained on a synthetic blur dataset derived from ScanNet++v2 and refined with real-world data, the approach achieves state-of-the-art angular and translational velocity estimates and outperforms baselines like MASt3R and COLMAP on real sequences. The method demonstrates strong robustness to aggressive motion, operates at 30 FPS, and offers drift-free velocity estimates that can enhance real-time state estimation and navigation in robotics and AR/VR applications.

Abstract

In many robotics and VR/AR applications, fast camera motions lead to a high level of motion blur, causing existing camera pose estimation methods to fail. In this work, we propose a novel framework that leverages motion blur as a rich cue for motion estimation rather than treating it as an unwanted artifact. Our approach works by predicting a dense motion flow field and a monocular depth map directly from a single motion-blurred image. We then recover the instantaneous camera velocity by solving a linear least squares problem under the small motion assumption. In essence, our method produces an IMU-like measurement that robustly captures fast and aggressive camera movements. To train our model, we construct a large-scale dataset with realistic synthetic motion blur derived from ScanNet++v2 and further refine our model by training end-to-end on real data using our fully differentiable pipeline. Extensive evaluations on real-world benchmarks demonstrate that our method achieves state-of-the-art angular and translational velocity estimates, outperforming current methods like MASt3R and COLMAP.

Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image

TL;DR

This work treats motion blur not as a nuisance but as a rich cue for estimating fast camera motion from a single blurred image. It introduces a two-stage pipeline that first predicts dense optical flow and monocular depth from the blurred frame, then solves a differentiable linear least-squares problem to recover instantaneous 6-DoF velocity, producing IMU-like measurements in real time. Trained on a synthetic blur dataset derived from ScanNet++v2 and refined with real-world data, the approach achieves state-of-the-art angular and translational velocity estimates and outperforms baselines like MASt3R and COLMAP on real sequences. The method demonstrates strong robustness to aggressive motion, operates at 30 FPS, and offers drift-free velocity estimates that can enhance real-time state estimation and navigation in robotics and AR/VR applications.

Abstract

In many robotics and VR/AR applications, fast camera motions lead to a high level of motion blur, causing existing camera pose estimation methods to fail. In this work, we propose a novel framework that leverages motion blur as a rich cue for motion estimation rather than treating it as an unwanted artifact. Our approach works by predicting a dense motion flow field and a monocular depth map directly from a single motion-blurred image. We then recover the instantaneous camera velocity by solving a linear least squares problem under the small motion assumption. In essence, our method produces an IMU-like measurement that robustly captures fast and aggressive camera movements. To train our model, we construct a large-scale dataset with realistic synthetic motion blur derived from ScanNet++v2 and further refine our model by training end-to-end on real data using our fully differentiable pipeline. Extensive evaluations on real-world benchmarks demonstrate that our method achieves state-of-the-art angular and translational velocity estimates, outperforming current methods like MASt3R and COLMAP.

Paper Structure

This paper contains 18 sections, 16 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Existing methods rely on establishing correspondences between multiple frames to estimate inter-frame camera motion (a). This leads to failures during fast motion with motion blur. We propose a method that can estimate intra-frame motion from a single image (b), making our method robust to aggressive motions.
  • Figure 2: Method overview. Given a single motion blurred image, we pass it through the network to predict the flow field and metric depth (Section \ref{['sec:flowdepth']}). These are then formulated in a linear system, where the optimal velocity parameters are solved for using linear least squares (Section \ref{['sec:velocity']}). Because the linear solver is fully differentiable, we can train the entire network end-to-end, supervised on the camera motion.
  • Figure 3: Overview for our synthetic dataset generation process. After preprocessing the dataset, we run selected frames through an interpolation network, which we use to synthesize our blurred image. We also take the first and last virtual frames to generate $\mathcal{\hat{D}}$, which is subsequently used for computing $\mathcal{\hat{F}}$.
  • Figure 4: Visualization of the predicted velocities for the billiards sequence using our method, MASt3R and COLMAP (w/ DISK+LightGlue). The shaded area under the curve shows the error between the predicted velocity and GT velocity. Our translations and rotations are significantly better than MASt3R. While COLMAP with DISK + LightGlue feature matching does well on rotations, our method significantly outperforms it on translations.
  • Figure 5: Error CDFs across all test sequences, such that the left and right plot show the distribution of translational and rotational errors, respectively. Curves closer to the top-left corner are better.
  • ...and 2 more figures