Table of Contents
Fetching ...

V2VSSC: A 3D Semantic Scene Completion Benchmark for Perception with Vehicle to Vehicle Communication

Yuanfang Zhang, Junxuan Li, Kaiqing Luo, Yiying Yang, Jiayi Han, Nian Liu, Denghui Qin, Peng Han, Chengpei Xu

TL;DR

The paper tackles occlusion-limited single-vehicle 3D Semantic Scene Completion (SSC) by introducing V2VSSC, a framework and benchmark that enables collaborative SSC via Vehicle-to-Vehicle communication. It builds a CARLA-based semantic occupancy dataset derived from OPV2V, with four models across six classes and three fusion strategies (early, intermediate, late) plus a no-fusion baseline, to evaluate multi-view SSC performance. Experiments show that cooperative perception improves SSC by up to 8.3 percentage points in IoU and 6 percentage points in mIoU, while also analyzing the effects of time delay, localization errors, and feature compression. The results demonstrate the practical potential of cross-vehicle sensing to mitigate occlusion and enhance autonomous navigation, with future work aimed at real-world validation.

Abstract

Semantic scene completion (SSC) has recently gained popularity because it can provide both semantic and geometric information that can be used directly for autonomous vehicle navigation. However, there are still challenges to overcome. SSC is often hampered by occlusion and short-range perception due to sensor limitations, which can pose safety risks. This paper proposes a fundamental solution to this problem by leveraging vehicle-to-vehicle (V2V) communication. We propose the first generalized collaborative SSC framework that allows autonomous vehicles to share sensing information from different sensor views to jointly perform SSC tasks. To validate the proposed framework, we further build V2VSSC, the first V2V SSC benchmark, on top of the large-scale V2V perception dataset OPV2V. Extensive experiments demonstrate that by leveraging V2V communication, the SSC performance can be increased by 8.3% on geometric metric IoU and 6.0% mIOU.

V2VSSC: A 3D Semantic Scene Completion Benchmark for Perception with Vehicle to Vehicle Communication

TL;DR

The paper tackles occlusion-limited single-vehicle 3D Semantic Scene Completion (SSC) by introducing V2VSSC, a framework and benchmark that enables collaborative SSC via Vehicle-to-Vehicle communication. It builds a CARLA-based semantic occupancy dataset derived from OPV2V, with four models across six classes and three fusion strategies (early, intermediate, late) plus a no-fusion baseline, to evaluate multi-view SSC performance. Experiments show that cooperative perception improves SSC by up to 8.3 percentage points in IoU and 6 percentage points in mIoU, while also analyzing the effects of time delay, localization errors, and feature compression. The results demonstrate the practical potential of cross-vehicle sensing to mitigate occlusion and enhance autonomous navigation, with future work aimed at real-world validation.

Abstract

Semantic scene completion (SSC) has recently gained popularity because it can provide both semantic and geometric information that can be used directly for autonomous vehicle navigation. However, there are still challenges to overcome. SSC is often hampered by occlusion and short-range perception due to sensor limitations, which can pose safety risks. This paper proposes a fundamental solution to this problem by leveraging vehicle-to-vehicle (V2V) communication. We propose the first generalized collaborative SSC framework that allows autonomous vehicles to share sensing information from different sensor views to jointly perform SSC tasks. To validate the proposed framework, we further build V2VSSC, the first V2V SSC benchmark, on top of the large-scale V2V perception dataset OPV2V. Extensive experiments demonstrate that by leveraging V2V communication, the SSC performance can be increased by 8.3% on geometric metric IoU and 6.0% mIOU.
Paper Structure (15 sections, 5 figures, 1 table)

This paper contains 15 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Two samples from Our V2VSSC.Top: The aggregated point cloud from surrounding CAVs. Middle: 4 views camera of the ego vehicle. Down: The semantic occupancy map generated by the ego vehicle.
  • Figure 2: Label Distribution. X-axis represents the semantic category, Y-axis represents the number of voxels.
  • Figure 3: The structure of Intermediate Fusion pipeline.
  • Figure 4: Time delay and location error
  • Figure 5: mIoU with respect to data size in log scale based on VoxelNet detector. The number $\times$ refers to the compression rate.