Table of Contents
Fetching ...

Semantic Segmentation Algorithm Based on Light Field and LiDAR Fusion

Jie Luo, Yuxuan Jiang, Xin Jin, Mingyu Liu, Yihui Fan

TL;DR

This work tackles occlusion and density-mismatch challenges in semantic segmentation by fusing light field images with LiDAR point clouds. It introduces TrafficScene, the first multimodal dataset with full-view light field annotations aligned to LiDAR, and presents Mlpfseg, a two-branch network with a Point-Pixel Feature Fusion Module and a Depth Difference Perception Module to fuse modalities and improve occlusion awareness. Experimental results on TrafficScene show that Mlpfseg surpasses single-modality and prior multimodal methods, achieving higher $mIoU$ for both image and point cloud tasks, with notable gains for small and occluded objects. The dataset and method collectively offer a practical pathway toward more robust autonomous driving perception, with future work focusing on joint depth estimation within the network.

Abstract

Semantic segmentation serves as a cornerstone of scene understanding in autonomous driving but continues to face significant challenges under complex conditions such as occlusion. Light field and LiDAR modalities provide complementary visual and spatial cues that are beneficial for robust perception; however, their effective integration is hindered by limited viewpoint diversity and inherent modality discrepancies. To address these challenges, the first multimodal semantic segmentation dataset integrating light field data and point cloud data is proposed. Based on this dataset, we proposed a multi-modal light field point-cloud fusion segmentation network(Mlpfseg), incorporating feature completion and depth perception to segment both camera images and LiDAR point clouds simultaneously. The feature completion module addresses the density mismatch between point clouds and image pixels by performing differential reconstruction of point-cloud feature maps, enhancing the fusion of these modalities. The depth perception module improves the segmentation of occluded objects by reinforcing attention scores for better occlusion awareness. Our method outperforms image-only segmentation by 1.71 Mean Intersection over Union(mIoU) and point cloud-only segmentation by 2.38 mIoU, demonstrating its effectiveness.

Semantic Segmentation Algorithm Based on Light Field and LiDAR Fusion

TL;DR

This work tackles occlusion and density-mismatch challenges in semantic segmentation by fusing light field images with LiDAR point clouds. It introduces TrafficScene, the first multimodal dataset with full-view light field annotations aligned to LiDAR, and presents Mlpfseg, a two-branch network with a Point-Pixel Feature Fusion Module and a Depth Difference Perception Module to fuse modalities and improve occlusion awareness. Experimental results on TrafficScene show that Mlpfseg surpasses single-modality and prior multimodal methods, achieving higher for both image and point cloud tasks, with notable gains for small and occluded objects. The dataset and method collectively offer a practical pathway toward more robust autonomous driving perception, with future work focusing on joint depth estimation within the network.

Abstract

Semantic segmentation serves as a cornerstone of scene understanding in autonomous driving but continues to face significant challenges under complex conditions such as occlusion. Light field and LiDAR modalities provide complementary visual and spatial cues that are beneficial for robust perception; however, their effective integration is hindered by limited viewpoint diversity and inherent modality discrepancies. To address these challenges, the first multimodal semantic segmentation dataset integrating light field data and point cloud data is proposed. Based on this dataset, we proposed a multi-modal light field point-cloud fusion segmentation network(Mlpfseg), incorporating feature completion and depth perception to segment both camera images and LiDAR point clouds simultaneously. The feature completion module addresses the density mismatch between point clouds and image pixels by performing differential reconstruction of point-cloud feature maps, enhancing the fusion of these modalities. The depth perception module improves the segmentation of occluded objects by reinforcing attention scores for better occlusion awareness. Our method outperforms image-only segmentation by 1.71 Mean Intersection over Union(mIoU) and point cloud-only segmentation by 2.38 mIoU, demonstrating its effectiveness.

Paper Structure

This paper contains 18 sections, 20 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Examples of the data we collected.
  • Figure 2: Multimodal acquisition system.
  • Figure 3: The proportion of annotated pixels (y-axis) per class (x-axis) in TrafficScene, Cityscapes b31, UrbanLF b7.
  • Figure 4: Internal structure of multimodal light field point cloud fusion segmentation network. It mainly consists of two parts: point-pixel interpolation fusion module (PFFM) and depth difference perception module (DDPM).
  • Figure 5: mIoU for PSPNet, PSPNet LGA and Mlpfseg on small objects across all viewing angle
  • ...and 2 more figures