Deflickering Vision-Based Occupancy Networks through Lightweight Spatio-Temporal Correlation

Fengcheng Yu; Haoran Xu; Canming Xia; Ziyang Zong; Guang Tan

Deflickering Vision-Based Occupancy Networks through Lightweight Spatio-Temporal Correlation

Fengcheng Yu, Haoran Xu, Canming Xia, Ziyang Zong, Guang Tan

TL;DR

This work tackles flickering in vision-based occupancy networks (VONs) used for autonomous driving by introducing OccLinker, a lightweight plug-in that leverages historical static cues and high-frequency motion information. OccLinker tokenizes static and motion features into sparse representations and applies dual cross-attention to learn compact latent correlations with the current frame, producing a correction term that refines the base VON predictions without retraining the backbone. The authors demonstrate that OccLinker improves both spatial occupancy accuracy (IoU/mIoU) and temporal consistency across two benchmarks (SurroundOcc and Occ3D) with minimal overhead, achieving favorable accuracy-efficiency trade-offs. The method is modular and compatible with existing VONs, offering a practical path to deflicker occupancy predictions in real-time autonomous systems.

Abstract

Vision-based occupancy networks (VONs) provide an end-to-end solution for reconstructing 3D environments in autonomous driving. However, existing methods often suffer from temporal inconsistencies, manifesting as flickering effects that degrade temporal coherence and adversely affect downstream decision-making. While recent approaches incorporate historical information to alleviate this issue, they often incur high computational costs and may introduce misaligned or redundant features that interfere with object detection. We propose OccLinker, a novel plugin framework that can be easily integrated into existing VONs to improve performance. Our method efficiently consolidates historical static and motion cues, learns sparse latent correlations with current features through a dual cross-attention mechanism, and generates correction occupancy components to refine the base network predictions. In addition, we introduce a new temporal consistency metric to quantitatively measure flickering effects. Extensive experiments on two benchmark datasets demonstrate that our method achieves superior performance with minimal computational overhead while effectively reducing flickering artifacts.

Deflickering Vision-Based Occupancy Networks through Lightweight Spatio-Temporal Correlation

TL;DR

Abstract

Deflickering Vision-Based Occupancy Networks through Lightweight Spatio-Temporal Correlation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)