Table of Contents
Fetching ...

DriveLiDAR4D: Sequential and Controllable LiDAR Scene Generation for Autonomous Driving

Kaiwen Cai, Xinze Liu, Xia Zhou, Hengtong Hu, Jie Xiang, Luyao Zhang, Xueyang Zhang, Kun Zhan, Yifei Zhan, Xianpeng Lang

TL;DR

The paper addresses the need for realistic, sequential LiDAR scene generation with fine-grained foreground and background control for autonomous driving. It introduces DriveLiDAR4D, a multimodal conditioning framework paired with LiDAR4DNet, an equirectangular spatial-temporal diffusion model featuring EST-Conv and EST-Trans to ensure temporal and spatial coherence. The approach leverages road sketches, scene captions, and object priors to guide generation, achieving state-of-the-art FRD, MMD, JSD, and FVD on nuScenes and KITTI-360, and improving downstream 3D object detection fidelity. These advances enable end-to-end, controllable, and realistic 4D LiDAR scene synthesis that can enhance simulation, evaluation, and safety analysis in autonomous driving systems.

Abstract

The generation of realistic LiDAR point clouds plays a crucial role in the development and evaluation of autonomous driving systems. Although recent methods for 3D LiDAR point cloud generation have shown significant improvements, they still face notable limitations, including the lack of sequential generation capabilities and the inability to produce accurately positioned foreground objects and realistic backgrounds. These shortcomings hinder their practical applicability. In this paper, we introduce DriveLiDAR4D, a novel LiDAR generation pipeline consisting of multimodal conditions and a novel sequential noise prediction model LiDAR4DNet, capable of producing temporally consistent LiDAR scenes with highly controllable foreground objects and realistic backgrounds. To the best of our knowledge, this is the first work to address the sequential generation of LiDAR scenes with full scene manipulation capability in an end-to-end manner. We evaluated DriveLiDAR4D on the nuScenes and KITTI datasets, where we achieved an FRD score of 743.13 and an FVD score of 16.96 on the nuScenes dataset, surpassing the current state-of-the-art (SOTA) method, UniScene, with an performance boost of 37.2% in FRD and 24.1% in FVD, respectively.

DriveLiDAR4D: Sequential and Controllable LiDAR Scene Generation for Autonomous Driving

TL;DR

The paper addresses the need for realistic, sequential LiDAR scene generation with fine-grained foreground and background control for autonomous driving. It introduces DriveLiDAR4D, a multimodal conditioning framework paired with LiDAR4DNet, an equirectangular spatial-temporal diffusion model featuring EST-Conv and EST-Trans to ensure temporal and spatial coherence. The approach leverages road sketches, scene captions, and object priors to guide generation, achieving state-of-the-art FRD, MMD, JSD, and FVD on nuScenes and KITTI-360, and improving downstream 3D object detection fidelity. These advances enable end-to-end, controllable, and realistic 4D LiDAR scene synthesis that can enhance simulation, evaluation, and safety analysis in autonomous driving systems.

Abstract

The generation of realistic LiDAR point clouds plays a crucial role in the development and evaluation of autonomous driving systems. Although recent methods for 3D LiDAR point cloud generation have shown significant improvements, they still face notable limitations, including the lack of sequential generation capabilities and the inability to produce accurately positioned foreground objects and realistic backgrounds. These shortcomings hinder their practical applicability. In this paper, we introduce DriveLiDAR4D, a novel LiDAR generation pipeline consisting of multimodal conditions and a novel sequential noise prediction model LiDAR4DNet, capable of producing temporally consistent LiDAR scenes with highly controllable foreground objects and realistic backgrounds. To the best of our knowledge, this is the first work to address the sequential generation of LiDAR scenes with full scene manipulation capability in an end-to-end manner. We evaluated DriveLiDAR4D on the nuScenes and KITTI datasets, where we achieved an FRD score of 743.13 and an FVD score of 16.96 on the nuScenes dataset, surpassing the current state-of-the-art (SOTA) method, UniScene, with an performance boost of 37.2% in FRD and 24.1% in FVD, respectively.

Paper Structure

This paper contains 15 sections, 4 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Comparison of LiDAR scenes generated by different methods on the nuScenes val split. DriveLiDAR4D is the first work to achieve sequential LiDAR scene generation with highly controllable scene manipulation abilities, including foreground control, background control and object-fidelity enhancement.
  • Figure 2: Visualization of the multimodal conditions of an example from the nuScenes dataset (Images have been resized for better visualization).
  • Figure 3: The pipeline of DriveLiDAR4D. We first derive multi-modal conditions, including road sketches, scene captions and object priors from a given road scene (see \ref{['sec_conditions']}). Then, the proposed LiDAR4DNet predicts the sequential noises based on the multimodal conditions, where Equirectangular Spatial-Temporal Convolution (EST-Conv) and Equirectangular Spatial-Temporal Convolution (EST-Trans) enforce spatial and temporal consistency (see \ref{['sec_seqnet']} ).
  • Figure 4: LiDAR scenes generated by DriveLiDAR4D with different conditions on the nuScenes val split.
  • Figure 5: Visualization of fine-grained scene manipulation of DriveLiDAR4D on the nuScenes val split. Top row: road sketches. Bottom row: LiDAR scenes. a) GT scene, b) Generated scene with multimodal conditions, c) Generated scene with edited multimodal conditions.
  • ...and 1 more figures