Table of Contents
Fetching ...

Moving Object Detection from Moving Camera Using Focus of Expansion Likelihood and Segmentation

Masahiro Ogawa, Qi An, Atsushi Yamashita

TL;DR

This work tackles moving object detection from a moving camera by combining optical flow with segmentation through Focus of Expansion (FoE) analysis. FoELS probabilistically fuses a FoE-derived moving likelihood with panoptic segmentation priors, augmented with a flow-length term to handle parallel motion and a macroscopic object-level refinement to recover entire objects. The approach leverages UniMatch for dense flow and OneFormer for panoptic segmentation, with FoE computed via RANSAC and enriched by angular and length-based cues, achieving state-of-the-art IoU on DAVIS 2016 and FBMS-59 while remaining robust to camera motion, rotation, and zoom. Despite strong performance, FoELS is computationally intensive and temporally unstable, motivating future work on efficiency and tracking enhancements.

Abstract

Separating moving and static objects from a moving camera viewpoint is essential for 3D reconstruction, autonomous navigation, and scene understanding in robotics. Existing approaches often rely primarily on optical flow, which struggles to detect moving objects in complex, structured scenes involving camera motion. To address this limitation, we propose Focus of Expansion Likelihood and Segmentation (FoELS), a method based on the core idea of integrating both optical flow and texture information. FoELS computes the focus of expansion (FoE) from optical flow and derives an initial motion likelihood from the outliers of the FoE computation. This likelihood is then fused with a segmentation-based prior to estimate the final moving probability. The method effectively handles challenges including complex structured scenes, rotational camera motion, and parallel motion. Comprehensive evaluations on the DAVIS 2016 dataset and real-world traffic videos demonstrate its effectiveness and state-of-the-art performance.

Moving Object Detection from Moving Camera Using Focus of Expansion Likelihood and Segmentation

TL;DR

This work tackles moving object detection from a moving camera by combining optical flow with segmentation through Focus of Expansion (FoE) analysis. FoELS probabilistically fuses a FoE-derived moving likelihood with panoptic segmentation priors, augmented with a flow-length term to handle parallel motion and a macroscopic object-level refinement to recover entire objects. The approach leverages UniMatch for dense flow and OneFormer for panoptic segmentation, with FoE computed via RANSAC and enriched by angular and length-based cues, achieving state-of-the-art IoU on DAVIS 2016 and FBMS-59 while remaining robust to camera motion, rotation, and zoom. Despite strong performance, FoELS is computationally intensive and temporally unstable, motivating future work on efficiency and tracking enhancements.

Abstract

Separating moving and static objects from a moving camera viewpoint is essential for 3D reconstruction, autonomous navigation, and scene understanding in robotics. Existing approaches often rely primarily on optical flow, which struggles to detect moving objects in complex, structured scenes involving camera motion. To address this limitation, we propose Focus of Expansion Likelihood and Segmentation (FoELS), a method based on the core idea of integrating both optical flow and texture information. FoELS computes the focus of expansion (FoE) from optical flow and derives an initial motion likelihood from the outliers of the FoE computation. This likelihood is then fused with a segmentation-based prior to estimate the final moving probability. The method effectively handles challenges including complex structured scenes, rotational camera motion, and parallel motion. Comprehensive evaluations on the DAVIS 2016 dataset and real-world traffic videos demonstrate its effectiveness and state-of-the-art performance.

Paper Structure

This paper contains 26 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Sample result of FoELS. It detects moving objects from a moving camera at various distances within the scene.
  • Figure 2: Detailed flowchart of the proposed method. The side image illustrates the process outlined in the flowchart.
  • Figure 3: Example visual results of FoELS on the DAVIS 2016 bear scene. First row (left to right): (a) Input frame, (b) segmentation result, and (c) prior moving probability derived from segmentation. Second row (left to right): (d) Optical flow, (e) optical flow with FoE inlier (green arrows) and outliers (red arrows), and (f) the FoE-based moving likelihood. Third row (left to right): (g) Posterior moving pixel probability, (h) refined object-level moving mask, and (i) the final moving object result.
  • Figure 4: Comparison results with AdversarialNet (left) and FoELS (right) across different scenarios. AdversarialNet exhibits limited generalization to unseen scenes, while FoELS maintains robust performance without scene-specific tuning. The dramatic visual improvement reflects the difference between real-world complexity and standard datasets.
  • Figure 5: Example visual results of FoELS on various motion types including parallel, opposite-direction, cross-direction, and crowded scenes. See Fig. \ref{['fig:comb_bear']} for the 9-subimage format.