Table of Contents
Fetching ...

AdaOcc: Adaptive-Resolution Occupancy Prediction

Chao Chen, Ruoyu Wang, Yuliang Guo, Cheng Zhao, Xinyu Huang, Chen Feng, Liu Ren

TL;DR

AdaOcc is introduced, a novel adaptive-resolution, multi-modal prediction approach that integrates object-centric 3D reconstruction and holistic occupancy prediction within a single framework, performing highly detailed and precise 3D reconstruction only in regions of interest (ROIs).

Abstract

Autonomous driving in complex urban scenarios requires 3D perception to be both comprehensive and precise. Traditional 3D perception methods focus on object detection, resulting in sparse representations that lack environmental detail. Recent approaches estimate 3D occupancy around vehicles for a more comprehensive scene representation. However, dense 3D occupancy prediction increases computational demands, challenging the balance between efficiency and resolution. High-resolution occupancy grids offer accuracy but demand substantial computational resources, while low-resolution grids are efficient but lack detail. To address this dilemma, we introduce AdaOcc, a novel adaptive-resolution, multi-modal prediction approach. Our method integrates object-centric 3D reconstruction and holistic occupancy prediction within a single framework, performing highly detailed and precise 3D reconstruction only in regions of interest (ROIs). These high-detailed 3D surfaces are represented in point clouds, thus their precision is not constrained by the predefined grid resolution of the occupancy map. We conducted comprehensive experiments on the nuScenes dataset, demonstrating significant improvements over existing methods. In close-range scenarios, we surpass previous baselines by over 13% in IOU, and over 40% in Hausdorff distance. In summary, AdaOcc offers a more versatile and effective framework for delivering accurate 3D semantic occupancy prediction across diverse driving scenarios.

AdaOcc: Adaptive-Resolution Occupancy Prediction

TL;DR

AdaOcc is introduced, a novel adaptive-resolution, multi-modal prediction approach that integrates object-centric 3D reconstruction and holistic occupancy prediction within a single framework, performing highly detailed and precise 3D reconstruction only in regions of interest (ROIs).

Abstract

Autonomous driving in complex urban scenarios requires 3D perception to be both comprehensive and precise. Traditional 3D perception methods focus on object detection, resulting in sparse representations that lack environmental detail. Recent approaches estimate 3D occupancy around vehicles for a more comprehensive scene representation. However, dense 3D occupancy prediction increases computational demands, challenging the balance between efficiency and resolution. High-resolution occupancy grids offer accuracy but demand substantial computational resources, while low-resolution grids are efficient but lack detail. To address this dilemma, we introduce AdaOcc, a novel adaptive-resolution, multi-modal prediction approach. Our method integrates object-centric 3D reconstruction and holistic occupancy prediction within a single framework, performing highly detailed and precise 3D reconstruction only in regions of interest (ROIs). These high-detailed 3D surfaces are represented in point clouds, thus their precision is not constrained by the predefined grid resolution of the occupancy map. We conducted comprehensive experiments on the nuScenes dataset, demonstrating significant improvements over existing methods. In close-range scenarios, we surpass previous baselines by over 13% in IOU, and over 40% in Hausdorff distance. In summary, AdaOcc offers a more versatile and effective framework for delivering accurate 3D semantic occupancy prediction across diverse driving scenarios.
Paper Structure (21 sections, 10 equations, 6 figures, 7 tables)

This paper contains 21 sections, 10 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: AdaOcc is a multimodal, adaptive-resolution approach designed for high precision in regions of interest while maintaining overall resource efficiency. On the left, AdaOcc's outputs are visualized using bounding boxes, voxels, and point clouds. On the right, visual comparisons are made between low, adaptive, and high-resolution occupancy map predictions, evaluated across five indicators that assess both accuracy and efficiency. The adaptive-resolution approach represents the data in surface format by applying surface reconstruction to point clouds. It yields the most balanced scores, effectively accommodating diverse driving tasks while managing computational costs.
  • Figure 2: Impact of Grid Resolution. The surrounding scene captured by the RGB images can be represented in occupancy maps with different voxel sizes. The measured distance between the same two cars with grid sizes of 0.8m and 0.2m can differ by up to 0.6m (1.4m - 0.8m). This discrepancy can significantly affect precise navigation in safety-critical scenarios.
  • Figure 3: AdaOcc Pipeline. We combine a low-resolution occupancy map and a high-resolution object point cloud to create a adaptive-resolution map. The green boxes represent three outputs from our spatial-temporal encoder: occupancy prediction, 3D object detection, and point cloud reconstruction. Depending on the backbone, the 2D-3D encoder varies. BEVFormer projects image features to a BEV feature volume, while CONET projects them to a 3D feature volume. We consider a BEV feature as a special case of a 3D feature volume where depth dimension equals to 1.
  • Figure 4: Qualitative Results on BEVFormer, CONet, and AdaOcc_B. As observed, BEVFormer and CONet involve erroneous connections between different objects. Benefiting from adaptive-resolution, AdaOcc distinctly separates each car with clear margins.
  • Figure I: Misaligned boxes lead to misalignment of the reconstructed objects. Moreover, we also demonstrate how Hausdorff distance works.
  • ...and 1 more figures