Table of Contents
Fetching ...

Amodal Optical Flow

Maximilian Luz, Rohit Mohan, Ahmed Rida Sekkat, Oliver Sawade, Elmar Matthes, Thomas Brox, Abhinav Valada

TL;DR

Amodal optical flow is introduced to capture motion for both visible and occluded scene elements through multi-layer motion fields and occlusion-aware stratification. The authors extend AmodalSynthDrive with ground-truth amodal flow and define AFQ as a unified metric for jointly evaluating flow and segmentation, expressed as $AFQ = \sqrt{mWAUC \cdot mIoU}$. They propose AmodalFlowNet, a transformer-based cost-volume encoder with a recurrent decoder that predicts per-layer motion fields and decomposed amodal masks with semantic grounding. AFQ is demonstrated to achieve state-of-the-art performance and improves panoptic tracking over baselines, highlighting practical value for robotics and dynamic scene understanding. The dataset, code, and trained models are released for public use.

Abstract

Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible and occluded regions of the scene. To facilitate research on this new task, we extend the AmodalSynthDrive dataset to include pixel-level labels for amodal optical flow estimation. We present several strong baselines, along with the Amodal Flow Quality metric to quantify the performance in an interpretable manner. Furthermore, we propose the novel AmodalFlowNet as an initial step toward addressing this task. AmodalFlowNet consists of a transformer-based cost-volume encoder paired with a recurrent transformer decoder which facilitates recurrent hierarchical feature propagation and amodal semantic grounding. We demonstrate the tractability of amodal optical flow in extensive experiments and show its utility for downstream tasks such as panoptic tracking. We make the dataset, code, and trained models publicly available at http://amodal-flow.cs.uni-freiburg.de.

Amodal Optical Flow

TL;DR

Amodal optical flow is introduced to capture motion for both visible and occluded scene elements through multi-layer motion fields and occlusion-aware stratification. The authors extend AmodalSynthDrive with ground-truth amodal flow and define AFQ as a unified metric for jointly evaluating flow and segmentation, expressed as . They propose AmodalFlowNet, a transformer-based cost-volume encoder with a recurrent decoder that predicts per-layer motion fields and decomposed amodal masks with semantic grounding. AFQ is demonstrated to achieve state-of-the-art performance and improves panoptic tracking over baselines, highlighting practical value for robotics and dynamic scene understanding. The dataset, code, and trained models are released for public use.

Abstract

Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible and occluded regions of the scene. To facilitate research on this new task, we extend the AmodalSynthDrive dataset to include pixel-level labels for amodal optical flow estimation. We present several strong baselines, along with the Amodal Flow Quality metric to quantify the performance in an interpretable manner. Furthermore, we propose the novel AmodalFlowNet as an initial step toward addressing this task. AmodalFlowNet consists of a transformer-based cost-volume encoder paired with a recurrent transformer decoder which facilitates recurrent hierarchical feature propagation and amodal semantic grounding. We demonstrate the tractability of amodal optical flow in extensive experiments and show its utility for downstream tasks such as panoptic tracking. We make the dataset, code, and trained models publicly available at http://amodal-flow.cs.uni-freiburg.de.
Paper Structure (11 sections, 8 equations, 4 figures, 3 tables)

This paper contains 11 sections, 8 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Illustration of Amodal Optical Flow, which aims to predict a multi-layered pixel-level motion field encompassing both visible and occluded regions of the scene. This task can represent transparent and partially occluded objects while also reducing the fragmentation of object segments through amodal (visible + occluded) motion representation of scene elements.
  • Figure 2: AmodalFlowNet architecture. Flow and corresponding amodal masks (yellow blocks) are estimated recurrently over both refinement steps (outer, blue arrow) and amodal layers (inner, green arrows). The decoder structure for the standard optical flow is retained from the baseline model. Additional semantic and mask predictions (red, green, and purple blocks) guide the network.
  • Figure 3: Log histograms of the optical flow direction and the spatial derivative of the horizontal optical flow velocity $u$. The modal statistics of AmodalSynthDrive (green) demonstrate similarities to KITTI (blue), indicating similar motion patterns, albeit the existence of spatial irregularity. This distinction can be attributed to the fact that the optical flow ground truth in our dataset is characterized by its high level of detail and precision.
  • Figure 4: Qualitative comparison of amodal optical flow prediction from our proposed AmodalFlowNet with the baseline M1 on the AmodalSynthDrive dataset. For visualization, we sequentially superimpose the multi-layer amodal optical flow predictions, in order of $M_0, M_1, \ldots, M_{N-1}$