Table of Contents
Fetching ...

MR3D-Net: Dynamic Multi-Resolution 3D Sparse Voxel Grid Fusion for LiDAR-Based Collective Perception

Sven Teufel, Jörg Gamerdinger, Georg Volk, Oliver Bringmann

TL;DR

This work addresses the bandwidth and information-loss challenges of collective perception for LiDAR by introducing MR3D-Net, a dynamic multi-resolution sparse voxel grid fusion backbone. By exchanging sparse voxel grids at multiple resolutions and fusing them via a scatter operation, MR3D-Net achieves state-of-the-art 3D detection on OPV2V while drastically reducing communication bandwidth. The approach demonstrates that a configurable, exchangeable environment representation can preserve geometric richness and enable effective fusion beyond early or late fusion schemes. Practical impact includes enabling scalable CP in real-world V2X networks with reduced bandwidth and standardized data representations, suitable for integration with existing detectors like PV-RCNN++.

Abstract

The safe operation of automated vehicles depends on their ability to perceive the environment comprehensively. However, occlusion, sensor range, and environmental factors limit their perception capabilities. To overcome these limitations, collective perception enables vehicles to exchange information. However, fusing this exchanged information is a challenging task. Early fusion approaches require large amounts of bandwidth, while intermediate fusion approaches face interchangeability issues. Late fusion of shared detections is currently the only feasible approach. However, it often results in inferior performance due to information loss. To address this issue, we propose MR3D-Net, a dynamic multi-resolution 3D sparse voxel grid fusion backbone architecture for LiDAR-based collective perception. We show that sparse voxel grids at varying resolutions provide a meaningful and compact environment representation that can adapt to the communication bandwidth. MR3D-Net achieves state-of-the-art performance on the OPV2V 3D object detection benchmark while reducing the required bandwidth by up to 94% compared to early fusion. Code is available at https://github.com/ekut-es/MR3D-Net

MR3D-Net: Dynamic Multi-Resolution 3D Sparse Voxel Grid Fusion for LiDAR-Based Collective Perception

TL;DR

This work addresses the bandwidth and information-loss challenges of collective perception for LiDAR by introducing MR3D-Net, a dynamic multi-resolution sparse voxel grid fusion backbone. By exchanging sparse voxel grids at multiple resolutions and fusing them via a scatter operation, MR3D-Net achieves state-of-the-art 3D detection on OPV2V while drastically reducing communication bandwidth. The approach demonstrates that a configurable, exchangeable environment representation can preserve geometric richness and enable effective fusion beyond early or late fusion schemes. Practical impact includes enabling scalable CP in real-world V2X networks with reduced bandwidth and standardized data representations, suitable for integration with existing detectors like PV-RCNN++.

Abstract

The safe operation of automated vehicles depends on their ability to perceive the environment comprehensively. However, occlusion, sensor range, and environmental factors limit their perception capabilities. To overcome these limitations, collective perception enables vehicles to exchange information. However, fusing this exchanged information is a challenging task. Early fusion approaches require large amounts of bandwidth, while intermediate fusion approaches face interchangeability issues. Late fusion of shared detections is currently the only feasible approach. However, it often results in inferior performance due to information loss. To address this issue, we propose MR3D-Net, a dynamic multi-resolution 3D sparse voxel grid fusion backbone architecture for LiDAR-based collective perception. We show that sparse voxel grids at varying resolutions provide a meaningful and compact environment representation that can adapt to the communication bandwidth. MR3D-Net achieves state-of-the-art performance on the OPV2V 3D object detection benchmark while reducing the required bandwidth by up to 94% compared to early fusion. Code is available at https://github.com/ekut-es/MR3D-Net
Paper Structure (15 sections, 2 figures, 2 tables)

This paper contains 15 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: scatter operation in the 1-dimensional case using the $max$ function fey2019fast
  • Figure 2: Architecture of MR3D-Net. The collective backbone consists of three input streams which take sparse voxel grids at different resolutions as input. The local backbone takes the voxelized ego point cloud as input and shares the structure with the high-resolution input stream from the collective backbone. The color of the sparse convolution blocks corresponds to their input resolution.