Table of Contents
Fetching ...

Multimotion Visual Odometry (MVO)

Kevin M. Judd, Jonathan D. Gammell

TL;DR

MVO tackles multimotion estimation by extending visual odometry with a motion-centric, graph-based multilabeling framework that segments and estimates full $SE\left(3\right)$ trajectories for all motions, including the sensor. It combines tracklet graphs, new label proposals, and a soft-energy optimization (Residual, Smoothness, Complexity) solved via CORAL, then refines with batch or sliding-window SE(3) estimators using pose-only, pose-velocity, or pose-velocity-acceleration priors. The approach supports geocentric estimation of third-party motions, extrapolates through occlusions with motion closure, and updates trajectories to maintain consistency across windows. Evaluations on the Oxford Multimotion Dataset and KITTI show competitive egomotion accuracy and improved multimotion tracking without appearance-based detectors, highlighting the method’s robustness to occlusions and its potential for diverse sensing modalities. Overall, MVO provides a versatile, motion-first alternative to detector-dependent multimotion tracking, with demonstrated applicability to dynamic driving and complex dynamic scenes.

Abstract

Visual motion estimation is a well-studied challenge in autonomous navigation. Recent work has focused on addressing multimotion estimation in highly dynamic environments. These environments not only comprise multiple, complex motions but also tend to exhibit significant occlusion. Estimating third-party motions simultaneously with the sensor egomotion is difficult because an object's observed motion consists of both its true motion and the sensor motion. Most previous works in multimotion estimation simplify this problem by relying on appearance-based object detection or application-specific motion constraints. These approaches are effective in specific applications and environments but do not generalize well to the full multimotion estimation problem (MEP). This paper presents Multimotion Visual Odometry (MVO), a multimotion estimation pipeline that estimates the full SE(3) trajectory of every motion in the scene, including the sensor egomotion, without relying on appearance-based information. MVO extends the traditional visual odometry (VO) pipeline with multimotion segmentation and tracking techniques. It uses physically founded motion priors to extrapolate motions through temporary occlusions and identify the reappearance of motions through motion closure. Evaluations on real-world data from the Oxford Multimotion Dataset (OMD) and the KITTI Vision Benchmark Suite demonstrate that MVO achieves good estimation accuracy compared to similar approaches and is applicable to a variety of multimotion estimation challenges.

Multimotion Visual Odometry (MVO)

TL;DR

MVO tackles multimotion estimation by extending visual odometry with a motion-centric, graph-based multilabeling framework that segments and estimates full trajectories for all motions, including the sensor. It combines tracklet graphs, new label proposals, and a soft-energy optimization (Residual, Smoothness, Complexity) solved via CORAL, then refines with batch or sliding-window SE(3) estimators using pose-only, pose-velocity, or pose-velocity-acceleration priors. The approach supports geocentric estimation of third-party motions, extrapolates through occlusions with motion closure, and updates trajectories to maintain consistency across windows. Evaluations on the Oxford Multimotion Dataset and KITTI show competitive egomotion accuracy and improved multimotion tracking without appearance-based detectors, highlighting the method’s robustness to occlusions and its potential for diverse sensing modalities. Overall, MVO provides a versatile, motion-first alternative to detector-dependent multimotion tracking, with demonstrated applicability to dynamic driving and complex dynamic scenes.

Abstract

Visual motion estimation is a well-studied challenge in autonomous navigation. Recent work has focused on addressing multimotion estimation in highly dynamic environments. These environments not only comprise multiple, complex motions but also tend to exhibit significant occlusion. Estimating third-party motions simultaneously with the sensor egomotion is difficult because an object's observed motion consists of both its true motion and the sensor motion. Most previous works in multimotion estimation simplify this problem by relying on appearance-based object detection or application-specific motion constraints. These approaches are effective in specific applications and environments but do not generalize well to the full multimotion estimation problem (MEP). This paper presents Multimotion Visual Odometry (MVO), a multimotion estimation pipeline that estimates the full SE(3) trajectory of every motion in the scene, including the sensor egomotion, without relying on appearance-based information. MVO extends the traditional visual odometry (VO) pipeline with multimotion segmentation and tracking techniques. It uses physically founded motion priors to extrapolate motions through temporary occlusions and identify the reappearance of motions through motion closure. Evaluations on real-world data from the Oxford Multimotion Dataset (OMD) and the KITTI Vision Benchmark Suite demonstrate that MVO achieves good estimation accuracy compared to similar approaches and is applicable to a variety of multimotion estimation challenges.

Paper Structure

This paper contains 53 sections, 76 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Motion trajectories estimated by MVO for sequences from the KITTI geiger2012 and OMD judd2019ral datasets. The KITTI segment involves an autonomous driving scenario in a residential environment where the car-mounted camera follows a van and a cyclist. The OMD segment includes four independently swinging blocks observed by a handheld camera. In both sequences, the $SE\left(3\right)$ trajectories of the camera egomotion and third-party motions are estimated simultaneously without prior knowledge of their number, appearance, or nature.
  • Figure 2: An illustration of the stereo MVO pipeline, which extends the standard VO pipeline by replacing the egomotion estimator with a multimotion estimator. MVO operates on 3D tracklets and generates the $SE\left(3\right)$ trajectory for every motion in the scene, including the sensor egomotion. The pipeline builds a neighborhood graph based on how rigidly pairs of points move over time and iteratively splits and estimates new labels using the graph. It assigns labels based on an energy functional, and merges labels that can be considered redundant until convergence. Once the segmentation converges, the labels are sanitized and a batch estimation produces the geocentric $SE\left(3\right)$ trajectories, employing motion closure to determine if newly discovered motions can be explained by the reappearance of an occluded object.
  • Figure 3: Illustrations of the MEP showing the motion of frames through time (left) and the relative point observations (right). Two independent third-party motions, $\underrightarrow{\boldsymbol{\mathcal{F}}}_{A}$ and $\underrightarrow{\boldsymbol{\mathcal{F}}}_{B}$, are observed by a moving camera, $\underrightarrow{\boldsymbol{\mathcal{F}}}_{C}$, through feature measurements on the objects, $\left\{\mathbf{p}^{{a_k}{C_k}}_{C_k}\right\}$ and $\left\{\mathbf{p}^{{b_k}{C_k}}_{C_k}\right\}$. Solving the MEP requires simultaneously segmenting and estimating the motions of these measurements.
  • Figure 4: A demonstration of the motion closure procedure showing trajectory estimates produced before (left), during (center), and after (right) an occlusion in the occlusion_2_unconstrained segment of the OMD. The trajectory of the swinging block (4, red) is directly estimated when it is visible and is extrapolated using the constant-velocity motion prior (dashed line) when the block is occluded by the moving tower (1, blue). When the block becomes unoccluded, it is rediscovered through motion closure and the estimates are interpolated to match the directly estimated trajectory.
  • Figure 5: Motion segmentation (top) and trajectories (bottom) produced by MVO using the pose-velocity estimator for the swinging_4_unconstrained data segment from the OMD. The egomotion (black, bottom) of the camera is estimated from the static points in the scene (black, top). The motions of the swinging blocks (1--4) are segmented and estimated simultaneously with the egomotion. The results for each MVO estimator for this segment are illustrated in Extensions 1--3 (\ref{['app:extensions']}).
  • ...and 10 more figures