Table of Contents
Fetching ...

FSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving

Erxin Guo, Pei An, You Yang, Qiong Liu, An-An Liu

TL;DR

FSF-Net introduces a coarse BEV scene-flow framework for 4D occupancy forecasting, addressing the difficulty of modeling future occupancy tendency. It combines a BEV Flow module with a VQ-Mamba latent-feature path and fuses them via a U-Net–based Quality-Fusion to produce refined future occupancy maps. The approach demonstrates significant gains over state-of-the-art methods on the Occ3D dataset, and also shows favorable motion-planning metrics without requiring maps or bounding-box supervision. Ablation studies confirm the value of each component, highlighting the importance of coarse-to-fine fusion and temporal modeling for robust 4D forecasting in autonomous driving.

Abstract

4D occupancy forecasting is one of the important techniques for autonomous driving, which can avoid potential risk in the complex traffic scenes. Scene flow is a crucial element to describe 4D occupancy map tendency. However, an accurate scene flow is difficult to predict in the real scene. In this paper, we find that BEV scene flow can approximately represent 3D scene flow in most traffic scenes. And coarse BEV scene flow is easy to generate. Under this thought, we propose 4D occupancy forecasting method FSF-Net based on coarse BEV scene flow. At first, we develop a general occupancy forecasting architecture based on coarse BEV scene flow. Then, to further enhance 4D occupancy feature representation ability, we propose a vector quantized based Mamba (VQ-Mamba) network to mine spatial-temporal structural scene feature. After that, to effectively fuse coarse occupancy maps forecasted from BEV scene flow and latent features, we design a U-Net based quality fusion (UQF) network to generate the fine-grained forecasting result. Extensive experiments are conducted on public Occ3D dataset. FSF-Net has achieved IoU and mIoU 9.56% and 10.87% higher than state-of-the-art method. Hence, we believe that proposed FSF-Net benefits to the safety of autonomous driving.

FSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving

TL;DR

FSF-Net introduces a coarse BEV scene-flow framework for 4D occupancy forecasting, addressing the difficulty of modeling future occupancy tendency. It combines a BEV Flow module with a VQ-Mamba latent-feature path and fuses them via a U-Net–based Quality-Fusion to produce refined future occupancy maps. The approach demonstrates significant gains over state-of-the-art methods on the Occ3D dataset, and also shows favorable motion-planning metrics without requiring maps or bounding-box supervision. Ablation studies confirm the value of each component, highlighting the importance of coarse-to-fine fusion and temporal modeling for robust 4D forecasting in autonomous driving.

Abstract

4D occupancy forecasting is one of the important techniques for autonomous driving, which can avoid potential risk in the complex traffic scenes. Scene flow is a crucial element to describe 4D occupancy map tendency. However, an accurate scene flow is difficult to predict in the real scene. In this paper, we find that BEV scene flow can approximately represent 3D scene flow in most traffic scenes. And coarse BEV scene flow is easy to generate. Under this thought, we propose 4D occupancy forecasting method FSF-Net based on coarse BEV scene flow. At first, we develop a general occupancy forecasting architecture based on coarse BEV scene flow. Then, to further enhance 4D occupancy feature representation ability, we propose a vector quantized based Mamba (VQ-Mamba) network to mine spatial-temporal structural scene feature. After that, to effectively fuse coarse occupancy maps forecasted from BEV scene flow and latent features, we design a U-Net based quality fusion (UQF) network to generate the fine-grained forecasting result. Extensive experiments are conducted on public Occ3D dataset. FSF-Net has achieved IoU and mIoU 9.56% and 10.87% higher than state-of-the-art method. Hence, we believe that proposed FSF-Net benefits to the safety of autonomous driving.
Paper Structure (15 sections, 12 equations, 6 figures, 7 tables)

This paper contains 15 sections, 12 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: The BEV Flow module and VQ-Mamba module achieve coarse scene prediction through voxel movement relationships and neural networks, respectively. Then, the results are aggregated through the Quality-Fusion module to obtain the final detailed prediction outcome.
  • Figure 2: (a) Framework of mainstream methods (b) Framework based on scene flow (c) Framework of fusion method
  • Figure 3: Late Fusion can combine the advantages of both prediction methods to achieve performance improvements.
  • Figure 4: Composition of the prediction network.
  • Figure 5: Visualization of the prediction results of the first two frames for three scenarios by OccWorld and FSF-Net.
  • ...and 1 more figures