Table of Contents
Fetching ...

Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction

Yuan Wu, Zhiqiang Yan, Zhengxue Wang, Xiang Li, Le Hui, Jian Yang

TL;DR

Deep Height Decoupling is presented, a novel framework that incorporates explicit height prior to filter out the confusing features in 2D image features to achieve state-of-the-art performance on the popular Occ3D-nuScenes benchmark.

Abstract

The task of vision-based 3D occupancy prediction aims to reconstruct 3D geometry and estimate its semantic classes from 2D color images, where the 2D-to-3D view transformation is an indispensable step. Most previous methods conduct forward projection, such as BEVPooling and VoxelPooling, both of which map the 2D image features into 3D grids. However, the current grid representing features within a certain height range usually introduces many confusing features that belong to other height ranges. To address this challenge, we present Deep Height Decoupling (DHD), a novel framework that incorporates explicit height prior to filter out the confusing features. Specifically, DHD first predicts height maps via explicit supervision. Based on the height distribution statistics, DHD designs Mask Guided Height Sampling (MGHS) to adaptively decouple the height map into multiple binary masks. MGHS projects the 2D image features into multiple subspaces, where each grid contains features within reasonable height ranges. Finally, a Synergistic Feature Aggregation (SFA) module is deployed to enhance the feature representation through channel and spatial affinities, enabling further occupancy refinement. On the popular Occ3D-nuScenes benchmark, our method achieves state-of-the-art performance even with minimal input frames. Source code is released at https://github.com/yanzq95/DHD.

Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction

TL;DR

Deep Height Decoupling is presented, a novel framework that incorporates explicit height prior to filter out the confusing features in 2D image features to achieve state-of-the-art performance on the popular Occ3D-nuScenes benchmark.

Abstract

The task of vision-based 3D occupancy prediction aims to reconstruct 3D geometry and estimate its semantic classes from 2D color images, where the 2D-to-3D view transformation is an indispensable step. Most previous methods conduct forward projection, such as BEVPooling and VoxelPooling, both of which map the 2D image features into 3D grids. However, the current grid representing features within a certain height range usually introduces many confusing features that belong to other height ranges. To address this challenge, we present Deep Height Decoupling (DHD), a novel framework that incorporates explicit height prior to filter out the confusing features. Specifically, DHD first predicts height maps via explicit supervision. Based on the height distribution statistics, DHD designs Mask Guided Height Sampling (MGHS) to adaptively decouple the height map into multiple binary masks. MGHS projects the 2D image features into multiple subspaces, where each grid contains features within reasonable height ranges. Finally, a Synergistic Feature Aggregation (SFA) module is deployed to enhance the feature representation through channel and spatial affinities, enabling further occupancy refinement. On the popular Occ3D-nuScenes benchmark, our method achieves state-of-the-art performance even with minimal input frames. Source code is released at https://github.com/yanzq95/DHD.
Paper Structure (16 sections, 10 equations, 9 figures, 3 tables)

This paper contains 16 sections, 10 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Projection comparison. (a) VoxelPooling huang2021bevdetli2023bevstereo retains height but overlooks class-specific height distributions. (b) BEVPooling yu2023flashoccyu2024panoptic sacrifices height details by collapsing the height dimension. In contrast, (c) our mask guided height sampling (MGHS) selectively projects 2D features based on object heights, preserving more accurate and detailed features.
  • Figure 2: Height distribution of different classes on Occ3D-nuScenes tian2024occ3d.
  • Figure 3: An overview of our deep height decoupling (DHD) framework (see section \ref{['sec:DHD']} for details).
  • Figure 4: (a) We decouple height into three intervals to differentiate features across heights and list the proportion of each class below. (b) The distribution of various classes across different heights, with the bar chart presenting statistical values within each interval.
  • Figure 5: Semantic and geometric analysis of the Occ3D-nuScenes tian2024occ3d. (a) Heatmap illustrates the normalized distribution of each category across different heights. (b) Cumulative distribution function (CDF) curve suggests the data concentrates in specific height layers.
  • ...and 4 more figures