Table of Contents
Fetching ...

I Can't Believe It's Not Scene Flow!

Ishan Khatri, Kyle Vedder, Neehar Peri, Deva Ramanan, James Hays

TL;DR

This work reveals that state-of-the-art scene flow methods fail to describe motion for small, safety-critical objects, a gap hidden by standard metrics. It introduces Bucket Normalized EPE, a class-aware, speed-normalized evaluation, and TrackFlow, a simple detector-plus-tracker baseline that achieves SOTA on Threeway EPE and large gains on Bucket Normalized EPE. The results argue for class- and speed-aware evaluation and for adopting class-imbalanced learning strategies in supervised scene flow. Practically, the paper provides a publicly available evaluation codebase and emphasizes detector quality and recall in tracking-based scene flow, highlighting real-world impact for robust motion understanding of pedestrians and VRUs.

Abstract

Current scene flow methods broadly fail to describe motion on small objects, and current scene flow evaluation protocols hide this failure by averaging over many points, with most drawn larger objects. To fix this evaluation failure, we propose a new evaluation protocol, Bucket Normalized EPE, which is class-aware and speed-normalized, enabling contextualized error comparisons between object types that move at vastly different speeds. To highlight current method failures, we propose a frustratingly simple supervised scene flow baseline, TrackFlow, built by bolting a high-quality pretrained detector (trained using many class rebalancing techniques) onto a simple tracker, that produces state-of-the-art performance on current standard evaluations and large improvements over prior art on our new evaluation. Our results make it clear that all scene flow evaluations must be class and speed aware, and supervised scene flow methods must address point class imbalances. We release the evaluation code publicly at https://github.com/kylevedder/BucketedSceneFlowEval.

I Can't Believe It's Not Scene Flow!

TL;DR

This work reveals that state-of-the-art scene flow methods fail to describe motion for small, safety-critical objects, a gap hidden by standard metrics. It introduces Bucket Normalized EPE, a class-aware, speed-normalized evaluation, and TrackFlow, a simple detector-plus-tracker baseline that achieves SOTA on Threeway EPE and large gains on Bucket Normalized EPE. The results argue for class- and speed-aware evaluation and for adopting class-imbalanced learning strategies in supervised scene flow. Practically, the paper provides a publicly available evaluation codebase and emphasizes detector quality and recall in tracking-based scene flow, highlighting real-world impact for robust motion understanding of pedestrians and VRUs.

Abstract

Current scene flow methods broadly fail to describe motion on small objects, and current scene flow evaluation protocols hide this failure by averaging over many points, with most drawn larger objects. To fix this evaluation failure, we propose a new evaluation protocol, Bucket Normalized EPE, which is class-aware and speed-normalized, enabling contextualized error comparisons between object types that move at vastly different speeds. To highlight current method failures, we propose a frustratingly simple supervised scene flow baseline, TrackFlow, built by bolting a high-quality pretrained detector (trained using many class rebalancing techniques) onto a simple tracker, that produces state-of-the-art performance on current standard evaluations and large improvements over prior art on our new evaluation. Our results make it clear that all scene flow evaluations must be class and speed aware, and supervised scene flow methods must address point class imbalances. We release the evaluation code publicly at https://github.com/kylevedder/BucketedSceneFlowEval.
Paper Structure (24 sections, 1 equation, 11 figures, 4 tables)

This paper contains 24 sections, 1 equation, 11 figures, 4 tables.

Figures (11)

  • Figure 1: We visualize an example of two pedestrians (walking from left to right), cherry-picked to have unusually high density lidar returns, making it particularly easy to estimate flow. We expect that state-of-the-art scene flow methods should work well in this case, but find that all prior art fails catastrophically. Notably, TrackFlow is the only method to estimate flow for these pedestrians.
  • Figure 2: Number of points from each semantic meta-class for Argoverse 2's val split. Although PEDESTRIAN instances are common, they contribute less than 1% of the total number of points owing to their small instance size relative to CAR and OTHER VEHICLES. Number of points (Y axis) shown on a log scale.
  • Figure 3: Overview of the Scene Flow via Tracking framework. Our proposed framework generates scene flow estimates using rigid transformations to describe point-level motion within a 3D object track.
  • Figure 4: Threeway EPE and Threeway EPE's Foreground Dynamic performance of recent supervised and unsupervised scene flow methods on Argoverse 2's test split. Supervised methods shown with hatching. Lower is better. Method color is consistent between plots. We find that all recent methods achieve 5cm error on Threeway EPE, suggesting that these approaches work well in-the-wild. However, this number hides the failure of these methods to describe small object motion.
  • Figure 5: Per meta-class Dynamic Normalized EPE of recent supervised and unsupervised scene flow estimation methods on Argoverse 2's test split. Supervised methods shown with hatching. Lower is better. Method color and position is consistent between plots. TrackFlow significantly improves over prior work on both pedestrian and wheeled VRUs. Notably, Bucket Normalized EPE quantitatively demonstrates significant method performance differences not highlighted in Threeway EPE.
  • ...and 6 more figures