POD: Predictive Object Detection with Single-Frame FMCW LiDAR Point Cloud
Yining Shi, Kun Jiang, Xin Zhao, Kangan Qian, Chuchu Xie, Tuopu Wen, Mengmeng Yang, Diange Yang
TL;DR
This work addresses the challenge of detecting not only current objects but their short-term future locations using only a single FMCW LiDAR frame. It introduces Predictive Object Detection (POD), which generates virtual future points from absolute velocity and encodes two-frame voxel features with 4D backbones (SparseConv4D and 4D Voxel Transformer) to produce both current and predictive BEV detections. The approach leverages radial velocity information intrinsic to FMCW LiDAR to distinguish dynamic from static elements and enables faster reaction times by avoiding historical data. Experiments on an in-house FMCW dataset demonstrate competitive standard object detection performance and superior predictive capabilities, with analysis of velocity preprocessing and encoder design highlighting practical insights for real-time autonomous perception. The method lays groundwork for fast, single-frame prediction in safety-critical driving scenarios and points to future work on transient hazards and broader data release.
Abstract
LiDAR-based 3D object detection is a fundamental task in the field of autonomous driving. This paper explores the unique advantage of Frequency Modulated Continuous Wave (FMCW) LiDAR in autonomous perception. Given a single frame FMCW point cloud with radial velocity measurements, we expect that our object detector can detect the short-term future locations of objects using only the current frame sensor data and demonstrate a fast ability to respond to intermediate danger. To achieve this, we extend the standard object detection task to a novel task named predictive object detection (POD), which aims to predict the short-term future location and dimensions of objects based solely on current observations. Typically, a motion prediction task requires historical sensor information to process the temporal contexts of each object, while our detector's avoidance of multi-frame historical information enables a much faster response time to potential dangers. The core advantage of FMCW LiDAR lies in the radial velocity associated with every reflected point. We propose a novel POD framework, the core idea of which is to generate a virtual future point using a ray casting mechanism, create virtual two-frame point clouds with the current and virtual future frames, and encode these two-frame voxel features with a sparse 4D encoder. Subsequently, the 4D voxel features are separated by temporal indices and remapped into two Bird's Eye View (BEV) features: one decoded for standard current frame object detection and the other for future predictive object detection. Extensive experiments on our in-house dataset demonstrate the state-of-the-art standard and predictive detection performance of the proposed POD framework.
