Stream and Query-guided Feature Aggregation for Efficient and Effective 3D Occupancy Prediction

Seokha Moon; Janghyun Baek; Giseop Kim; Jinkyu Kim; Sunwook Choi

Stream and Query-guided Feature Aggregation for Efficient and Effective 3D Occupancy Prediction

Seokha Moon, Janghyun Baek, Giseop Kim, Jinkyu Kim, Sunwook Choi

TL;DR

DuOcc tackles the accuracy–efficiency trade-off in 3D occupancy prediction by introducing a dual aggregation framework that preserves dense voxel geometry while remaining computationally efficient. StreamAgg accumulates voxel features over time with motion-aware warping and lightweight refinement, while QueryAgg injects instance-level dynamic object information via deformable attention and selective aggregation. The combined approach yields state-of-the-art results on Occ3D-nuScenes and SurroundOcc under real-time constraints, with substantial memory savings. This work advances practical 3D scene understanding for autonomous driving by enabling high-fidelity occupancy maps with efficient, real-time processing.

Abstract

3D occupancy prediction has become a key perception task in autonomous driving, as it enables comprehensive scene understanding. Recent methods enhance this understanding by incorporating spatiotemporal information through multi-frame fusion, but they suffer from a trade-off: dense voxel-based representations provide high accuracy at significant computational cost, whereas sparse representations improve efficiency but lose spatial detail. To mitigate this trade-off, we introduce DuOcc, which employs a dual aggregation strategy that retains dense voxel representations to preserve spatial fidelity while maintaining high efficiency. DuOcc consists of two key components: (i) Stream-based Voxel Aggregation, which recurrently accumulates voxel features over time and refines them to suppress warping-induced distortions, preserving a clear separation between occupied and free space. (ii) Query-guided Aggregation, which complements the limitations of voxel accumulation by selectively injecting instance-level query features into the voxel regions occupied by dynamic objects. Experiments on the widely used Occ3D-nuScenes and SurroundOcc datasets demonstrate that DuOcc achieves state-of-the-art performance in real-time settings, while reducing memory usage by over 40% compared to prior methods.

Stream and Query-guided Feature Aggregation for Efficient and Effective 3D Occupancy Prediction

TL;DR

Abstract

Stream and Query-guided Feature Aggregation for Efficient and Effective 3D Occupancy Prediction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)