Table of Contents
Fetching ...

UniFlow: Towards Zero-Shot LiDAR Scene Flow for Autonomous Vehicles via Cross-Domain Generalization

Siyi Li, Qingwen Zhang, Ishan Khatri, Kyle Vedder, Deva Ramanan, Neehar Peri

TL;DR

This work tackles the generalization of LiDAR-based scene flow across diverse sensors and datasets. It introduces UniFlow, a simple multi-dataset training framework that unifies four AV datasets and retrains state-of-the-art scene-flow models, yielding substantial improvements in both in-domain and zero-shot generalization. The approach achieves new state-of-the-art results on Waymo and nuScenes and strong zero-shot performance on TruckScenes, with analyses highlighting the role of velocity distributions and low-level geometric priors in cross-domain transfer. The findings suggest that learning universal motion priors through dataset unification can greatly enhance robust 3D motion understanding for autonomous vehicles and beyond, while also identifying avenues for future work in non-AV domains and high-speed scenarios.

Abstract

LiDAR scene flow is the task of estimating per-point 3D motion between consecutive point clouds. Recent methods achieve centimeter-level accuracy on popular autonomous vehicle (AV) datasets, but are typically only trained and evaluated on a single sensor. In this paper, we aim to learn general motion priors that transfer to diverse and unseen LiDAR sensors. However, prior work in LiDAR semantic segmentation and 3D object detection demonstrate that naively training on multiple datasets yields worse performance than single dataset models. Interestingly, we find that this conventional wisdom does not hold for motion estimation, and that state-of-the-art scene flow methods greatly benefit from cross-dataset training. We posit that low-level tasks such as motion estimation may be less sensitive to sensor configuration; indeed, our analysis shows that models trained on fast-moving objects (e.g., from highway datasets) perform well on fast-moving objects, even across different datasets. Informed by our analysis, we propose UniFlow, a family of feedforward models that unifies and trains on multiple large-scale LiDAR scene flow datasets with diverse sensor placements and point cloud densities. Our frustratingly simple solution establishes a new state-of-the-art on Waymo and nuScenes, improving over prior work by 5.1% and 35.2% respectively. Moreover, UniFlow achieves state-of-the-art accuracy on unseen datasets like TruckScenes, outperforming prior TruckScenes-specific models by 30.1%.

UniFlow: Towards Zero-Shot LiDAR Scene Flow for Autonomous Vehicles via Cross-Domain Generalization

TL;DR

This work tackles the generalization of LiDAR-based scene flow across diverse sensors and datasets. It introduces UniFlow, a simple multi-dataset training framework that unifies four AV datasets and retrains state-of-the-art scene-flow models, yielding substantial improvements in both in-domain and zero-shot generalization. The approach achieves new state-of-the-art results on Waymo and nuScenes and strong zero-shot performance on TruckScenes, with analyses highlighting the role of velocity distributions and low-level geometric priors in cross-domain transfer. The findings suggest that learning universal motion priors through dataset unification can greatly enhance robust 3D motion understanding for autonomous vehicles and beyond, while also identifying avenues for future work in non-AV domains and high-speed scenarios.

Abstract

LiDAR scene flow is the task of estimating per-point 3D motion between consecutive point clouds. Recent methods achieve centimeter-level accuracy on popular autonomous vehicle (AV) datasets, but are typically only trained and evaluated on a single sensor. In this paper, we aim to learn general motion priors that transfer to diverse and unseen LiDAR sensors. However, prior work in LiDAR semantic segmentation and 3D object detection demonstrate that naively training on multiple datasets yields worse performance than single dataset models. Interestingly, we find that this conventional wisdom does not hold for motion estimation, and that state-of-the-art scene flow methods greatly benefit from cross-dataset training. We posit that low-level tasks such as motion estimation may be less sensitive to sensor configuration; indeed, our analysis shows that models trained on fast-moving objects (e.g., from highway datasets) perform well on fast-moving objects, even across different datasets. Informed by our analysis, we propose UniFlow, a family of feedforward models that unifies and trains on multiple large-scale LiDAR scene flow datasets with diverse sensor placements and point cloud densities. Our frustratingly simple solution establishes a new state-of-the-art on Waymo and nuScenes, improving over prior work by 5.1% and 35.2% respectively. Moreover, UniFlow achieves state-of-the-art accuracy on unseen datasets like TruckScenes, outperforming prior TruckScenes-specific models by 30.1%.

Paper Structure

This paper contains 12 sections, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Dataset Diversity. We visualize the front-center RGB (top), LiDAR sensor positions (middle) and BEV LiDAR point clouds (bottom) for Argoverse 2, Waymo, nuScenes and TruckScenes. Notably, all four datasets use different sensors, and collect data in different environments. Specifically, Argoverse 2, Waymo, and nuScenes collect data in urban city centers with sedans, while TruckScenes primarily collects data on highways with a truck. Due to the diversity of environments and sensor configurations, contemporary LiDAR scene flow methods typically only train and evaluate on each dataset independently. However, we find that multi-dataset training significantly improves both in-domain and out-of-domain generalization. Note that RGB images are shown for visualization purposes only; we address LiDAR-only scene flow for AVs.
  • Figure 2: Cross-Dataset Generalization Correlates with Velocity Distribution. We plot the velocity distributions for the AV2, Waymo, nuScenes, and TruckScenes train sets (top) and the Dynamic Mean EPE per velocity bin of Flow4D trained on AV2, Waymo, nuScenes, TruckScenes, and UniFlow (bottom). Notably, Flow4D trained on TruckScenes outpeforms Flow4D trained on any other dataset for fast moving objects (2.0, $\infty$) across all datasets because it has the largest number of fast moving objects.
  • Figure 3: Zero-Shot Generalization on TruckScenes. We qualitatively compare the dataset-specific $\Delta$Flow model and our $\Delta$Flow (UniFlow) model above. $\Delta$Flow (UniFlow) produces more accurate vehicle motion estimates in general, and avoids falsely predicting motion for rain artifacts (on the top left) as seen in $\Delta$Flow (top row). Next, $\Delta$Flow (UniFlow) correctly estimates the motion of the van since it has been trained on more examples of rare classes (middle row). Lastly, $\Delta$Flow (UniFlow) generalizes significantly better at long-range, accurately estimating the truck's motion at $\sim$35 m and the car's motion at $\sim$70 m, despite extreme point sparsity (bottom row).
  • Figure 4: Scaling Laws. We evaluate both the in-distribution (on AV2, nuScenes, and Waymo) and out-of-distribution (on TruckScenes) performance of Flow4D (UniFlow) with different amounts of training data. Unsurprisingly, increasing data reduces Dynamic Mean EPE. However, we find that data augmentation is significantly more important for out-of-distribution performance, and has minimal impact on in-distribution performance. Lower is better.
  • Figure 5: Comparing Original and Downsampled Velocity Distributions. We plot the velocity distributions of the original AV2, Waymo, and nuScenes training sets (top row), their corresponding down-sampled "fast" versions (bottom row), and the unified distribution that combine the three datasets (right column).
  • ...and 1 more figures