Table of Contents
Fetching ...

Event-aided Semantic Scene Completion

Shangwei Guo, Hao Shi, Song Wang, Xiaoting Yin, Kailun Yang, Kaiwei Wang

TL;DR

This work tackles robust 3D scene understanding for autonomous driving by augmenting Semantic Scene Completion with event-camera data. It introduces DSEC-SSC, the first real-world event-enabled SSC dataset with a deployable 4D labeling pipeline, and EvSSC, an RGB-Event fusion framework built around an Event-aided Lifting Module (ELM) that bridges 2D features to 3D occupancy. EvSSC demonstrates consistent gains across transformer- and LSS-based SSC models, achieving up to $52.5\%$ relative improvement in $mIoU$ on corrupted data and enhanced performance under motion blur and adverse weather, while maintaining modest latency and memory overhead. The publicly released dataset and codebase support broader adoption and further exploration of event-based semantic scene understanding for safer, more reliable autonomous perception.

Abstract

Autonomous driving systems rely on robust 3D scene understanding. Recent advances in Semantic Scene Completion (SSC) for autonomous driving underscore the limitations of RGB-based approaches, which struggle under motion blur, poor lighting, and adverse weather. Event cameras, offering high dynamic range and low latency, address these challenges by providing asynchronous data that complements RGB inputs. We present DSEC-SSC, the first real-world benchmark specifically designed for event-aided SSC, which includes a novel 4D labeling pipeline for generating dense, visibility-aware labels that adapt dynamically to object motion. Our proposed RGB-Event fusion framework, EvSSC, introduces an Event-aided Lifting Module (ELM) that effectively bridges 2D RGB-Event features to 3D space, enhancing view transformation and the robustness of 3D volume construction across SSC models. Extensive experiments on DSEC-SSC and simulated SemanticKITTI-E demonstrate that EvSSC is adaptable to both transformer-based and LSS-based SSC architectures. Notably, evaluations on SemanticKITTI-C demonstrate that EvSSC achieves consistently improved prediction accuracy across five degradation modes and both In-domain and Out-of-domain settings, achieving up to a 52.5% relative improvement in mIoU when the image sensor partially fails. Additionally, we quantitatively and qualitatively validate the superiority of EvSSC under motion blur and extreme weather conditions, where autonomous driving is challenged. The established datasets and our codebase will be made publicly at https://github.com/Pandapan01/EvSSC.

Event-aided Semantic Scene Completion

TL;DR

This work tackles robust 3D scene understanding for autonomous driving by augmenting Semantic Scene Completion with event-camera data. It introduces DSEC-SSC, the first real-world event-enabled SSC dataset with a deployable 4D labeling pipeline, and EvSSC, an RGB-Event fusion framework built around an Event-aided Lifting Module (ELM) that bridges 2D features to 3D occupancy. EvSSC demonstrates consistent gains across transformer- and LSS-based SSC models, achieving up to relative improvement in on corrupted data and enhanced performance under motion blur and adverse weather, while maintaining modest latency and memory overhead. The publicly released dataset and codebase support broader adoption and further exploration of event-based semantic scene understanding for safer, more reliable autonomous perception.

Abstract

Autonomous driving systems rely on robust 3D scene understanding. Recent advances in Semantic Scene Completion (SSC) for autonomous driving underscore the limitations of RGB-based approaches, which struggle under motion blur, poor lighting, and adverse weather. Event cameras, offering high dynamic range and low latency, address these challenges by providing asynchronous data that complements RGB inputs. We present DSEC-SSC, the first real-world benchmark specifically designed for event-aided SSC, which includes a novel 4D labeling pipeline for generating dense, visibility-aware labels that adapt dynamically to object motion. Our proposed RGB-Event fusion framework, EvSSC, introduces an Event-aided Lifting Module (ELM) that effectively bridges 2D RGB-Event features to 3D space, enhancing view transformation and the robustness of 3D volume construction across SSC models. Extensive experiments on DSEC-SSC and simulated SemanticKITTI-E demonstrate that EvSSC is adaptable to both transformer-based and LSS-based SSC architectures. Notably, evaluations on SemanticKITTI-C demonstrate that EvSSC achieves consistently improved prediction accuracy across five degradation modes and both In-domain and Out-of-domain settings, achieving up to a 52.5% relative improvement in mIoU when the image sensor partially fails. Additionally, we quantitatively and qualitatively validate the superiority of EvSSC under motion blur and extreme weather conditions, where autonomous driving is challenged. The established datasets and our codebase will be made publicly at https://github.com/Pandapan01/EvSSC.

Paper Structure

This paper contains 19 sections, 11 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: (a) Role of event data in enhancing semantic scene completion: Under challenging lighting conditions, RGB-based methods struggle to detect low-contrast objects, whereas event data enhances visibility and improves 3D occupancy predictions. (b) Performance comparison on the corrupted SemanticKITTI-C: mIoU results across Out-of-Domain and In-Domain scenarios, with and without event data integration, showing scores under various conditions, including Motion Blur (MB), Fog (F), Brightness (B), Darkness (D), and Shot Noise (SN).
  • Figure 2: Overview of the occupancy label generation pipeline. The pipeline consists of three main steps: Semantic-Maps-Guided Dynamic Object Segmentation (Sec. \ref{['sec:semantic_guided_dynamic_object_segmentation']}), Dynamic Object 4D Reconstruction (Sec. \ref{['sec:dynamic_objects_4d_reconstruction']}), and Probability-Guided Voxel Refinement (Sec. \ref{['sec:probability_guided_voxel_refinement']}).
  • Figure 3: Overview of label distribution in the DSEC-SSC dataset. The label distribution in the DSEC-SSC dataset is presented with the y-axis plotted on a logarithmic scale.
  • Figure 4: Comparison of point clouds (a) without and (b) with dynamic object processing.
  • Figure 5: (a) Event-image 2D-to-3D fusion paradigms: a spectrum of paradigms for feature fusion: (Left) performs 2D feature fusion after encoding, (Right) performs voxel fusion prior to the segmentation head. (Middle) ELM: fuse 2D or 3D features during the lifting process. (b) EvSSC: Given events and RGB images, 2D features are extracted. In ELM, by incorporating camera and level embeddings, the image $k$&$v$ and event $k$&$v$ are fused through self-attention to obtain the fusion $k$&$v$. Mask tokens and voxel queries are added to complete voxel features through deformable attention. 3D occupancy prediction is obtained through the segmentation head.
  • ...and 3 more figures