LiDAR-based 4D Occupancy Completion and Forecasting

Xinhao Liu; Moonjun Gong; Qi Fang; Haoyu Xie; Yiming Li; Hang Zhao; Chen Feng

LiDAR-based 4D Occupancy Completion and Forecasting

Xinhao Liu, Moonjun Gong, Qi Fang, Haoyu Xie, Yiming Li, Hang Zhao, Chen Feng

TL;DR

This paper defines Occupancy Completion and Forecasting (OCF), a novel LiDAR-based 4D perception task that jointly performs scene completion and temporal forecasting in an Eulerian voxel grid framework. It formulates supervision by voxelizing and aggregating multiple LiDAR sweeps, while correcting egomotion to operate consistently in the $t=0$ frame, and introduces the large OCFBench dataset built from public autonomous driving data. Baseline and enhanced architectures (PCF, ConvLSTM, Conv3D) are benchmarked with a loss that combines BCE and soft‑IoU, evaluated on mIoU, mAP, precision, recall, and F1 across multiple horizons and domains. Results show Conv3D generally yields the best performance, soft‑IoU improves most metrics, and cross‑domain gaps remain a challenge, highlighting the need for domain‑robust 4D occupancy perception. The work provides a scalable dataset and baseline methods to spur further research on dense 4D occupancy perception for autonomous driving.

Abstract

Scene completion and forecasting are two popular perception problems in research for mobile agents like autonomous vehicles. Existing approaches treat the two problems in isolation, resulting in a separate perception of the two aspects. In this paper, we introduce a novel LiDAR perception task of Occupancy Completion and Forecasting (OCF) in the context of autonomous driving to unify these aspects into a cohesive framework. This task requires new algorithms to address three challenges altogether: (1) sparse-to-dense reconstruction, (2) partial-to-complete hallucination, and (3) 3D-to-4D prediction. To enable supervision and evaluation, we curate a large-scale dataset termed OCFBench from public autonomous driving datasets. We analyze the performance of closely related existing baseline models and our own ones on our dataset. We envision that this research will inspire and call for further investigation in this evolving and crucial area of 4D perception. Our code for data curation and baseline implementation is available at https://github.com/ai4ce/Occ4cast.

LiDAR-based 4D Occupancy Completion and Forecasting

TL;DR

frame, and introduces the large OCFBench dataset built from public autonomous driving data. Baseline and enhanced architectures (PCF, ConvLSTM, Conv3D) are benchmarked with a loss that combines BCE and soft‑IoU, evaluated on mIoU, mAP, precision, recall, and F1 across multiple horizons and domains. Results show Conv3D generally yields the best performance, soft‑IoU improves most metrics, and cross‑domain gaps remain a challenge, highlighting the need for domain‑robust 4D occupancy perception. The work provides a scalable dataset and baseline methods to spur further research on dense 4D occupancy perception for autonomous driving.

Abstract

Paper Structure (13 sections, 1 equation, 5 figures, 4 tables)

This paper contains 13 sections, 1 equation, 5 figures, 4 tables.

INTRODUCTION
RELATED WORK
Problem formulation
DATA CURATION
Data processing pipeline
OCFBench dataset
EXPERIMENT
Benchmark methods
Evaluation metric
Common results on all datasets
Results for cross-domain adaptation
Model specs.
Conclusion

Figures (5)

Figure 1: Distinctions between OCF and related tasks. (a) All tasks take a sequence or a single LiDAR sweep as input. (b) SSC aims to densify, complete, and semantically predict on the $t=0$ frame. (c) Point/occupancy forecasting outputs a sparse and Lagrangian specification of the scene geometry's motion field. (d) OCF combines scene completion and occupancy forecasting in a spatial-temporal way, outputting a dense and Eulerian motion field. The color gradient in (d) indicates the z-coordinate.
Figure 2: Illustration of the OCF task. The input is provided as a sequence of sparse LiDAR sweeps from $t=-T$ to $t=0$. The output is a sequence of densified and completed voxels from $t=0$ to $t=T$. The color gradient indicates the z-coordinate of each voxel. All point clouds and voxels are expressed in the coordinate frame at $t=0$. The yellow bounding boxes highlight typical moving objects. The images on the top row are only for visualization and are not included in the input. Figure best viewed in color.
Figure 3: Steps in data processing. (a) Dynamic-object-synchronization addresses spatial-temporal tubes by registering each object individually. (b) Unknown voxels are handled by running a ray-casting algorithm to find out unknown voxels and ignore them for supervision and evaluation (c) Changing sensor extrinsics is compensated by unifying different coordinate frames to the $t=0$ frame.
Figure 4: Performance degradation w.r.t. time. We show the per-frame IoU for each method when forecasting 10 future frames with 5/10 input frames.
Figure 5: Results for cross-domain evaluation. All methods were trained on OCFBench-Lyft and tested on the other two datasets with the 10/10 input/output setup. The degradation in percentage is marked on the chart.

LiDAR-based 4D Occupancy Completion and Forecasting

TL;DR

Abstract

LiDAR-based 4D Occupancy Completion and Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (5)