Vision-based 3D occupancy prediction in autonomous driving: a review and outlook

Yanan Zhang; Jinqing Zhang; Zengran Wang; Junhao Xu; Di Huang

Vision-based 3D occupancy prediction in autonomous driving: a review and outlook

Yanan Zhang, Jinqing Zhang, Zengran Wang, Junhao Xu, Di Huang

TL;DR

The paper surveys vision-based 3D occupancy prediction for autonomous driving, emphasizing how voxel-level occupancy and dense semantic labeling from multi-view imagery can complement or surpass traditional object-centric perception. It classifies methods into feature-enhancement, deployment-friendly, and label-efficient categories, detailing BEV/TPV/voxel representations, strategies to reduce computation, and ways to supervise without dense annotations. Key contributions include a structured taxonomy, critical comparisons across datasets and metrics, and a discussion of open challenges with concrete future directions. The review also highlights ongoing trends toward open-vocabulary semantics, 4D occupancy forecasting, and collaborative perception to advance robust, scalable autonomous driving perception systems.

Abstract

In recent years, autonomous driving has garnered escalating attention for its potential to relieve drivers' burdens and improve driving safety. Vision-based 3D occupancy prediction, which predicts the spatial occupancy status and semantics of 3D voxel grids around the autonomous vehicle from image inputs, is an emerging perception task suitable for cost-effective perception system of autonomous driving. Although numerous studies have demonstrated the greater advantages of 3D occupancy prediction over object-centric perception tasks, there is still a lack of a dedicated review focusing on this rapidly developing field. In this paper, we first introduce the background of vision-based 3D occupancy prediction and discuss the challenges in this task. Secondly, we conduct a comprehensive survey of the progress in vision-based 3D occupancy prediction from three aspects: feature enhancement, deployment friendliness and label efficiency, and provide an in-depth analysis of the potentials and challenges of each category of methods. Finally, we present a summary of prevailing research trends and propose some inspiring future outlooks. To provide a valuable reference for researchers, a regularly updated collection of related papers, datasets, and codes is organized at https://github.com/zya3d/Awesome-3D-Occupancy-Prediction.

Vision-based 3D occupancy prediction in autonomous driving: a review and outlook

TL;DR

Abstract

Paper Structure (23 sections, 3 equations, 20 figures, 5 tables)

This paper contains 23 sections, 3 equations, 20 figures, 5 tables.

Introduction
Background
The definition of vision-based 3D occupancy prediction
Ground truth generation
Datasets
Evaluation metrics
Key challenges
Feature enhancement methods
BEV-based methods
TPV-based methods
Voxel-based methods
Convolution-based methods
Query-based methods
Deployment-friendly methods
Perspective decomposition methods
...and 8 more sections

Figures (20)

Figure 1: Visual comparison on 3D occupancy annotationstong2023scene. (a) sparse occupancy; (b) dense occupancy.
Figure 2: The pipeline for generating dense 3D occupancy annotations.
Figure 3: Chronological overview of vision-based 3D occupancy prediction methods.
Figure 4: Hierarchically-structured taxonomy of vision-based 3D occupancy prediction for autonomous driving.
Figure 5: Illustration of BEV-based methods.
...and 15 more figures

Vision-based 3D occupancy prediction in autonomous driving: a review and outlook

TL;DR

Abstract

Vision-based 3D occupancy prediction in autonomous driving: a review and outlook

Authors

TL;DR

Abstract

Table of Contents

Figures (20)