Semantic Segmentation Algorithm Based on Light Field and LiDAR Fusion
Jie Luo, Yuxuan Jiang, Xin Jin, Mingyu Liu, Yihui Fan
TL;DR
This work tackles occlusion and density-mismatch challenges in semantic segmentation by fusing light field images with LiDAR point clouds. It introduces TrafficScene, the first multimodal dataset with full-view light field annotations aligned to LiDAR, and presents Mlpfseg, a two-branch network with a Point-Pixel Feature Fusion Module and a Depth Difference Perception Module to fuse modalities and improve occlusion awareness. Experimental results on TrafficScene show that Mlpfseg surpasses single-modality and prior multimodal methods, achieving higher $mIoU$ for both image and point cloud tasks, with notable gains for small and occluded objects. The dataset and method collectively offer a practical pathway toward more robust autonomous driving perception, with future work focusing on joint depth estimation within the network.
Abstract
Semantic segmentation serves as a cornerstone of scene understanding in autonomous driving but continues to face significant challenges under complex conditions such as occlusion. Light field and LiDAR modalities provide complementary visual and spatial cues that are beneficial for robust perception; however, their effective integration is hindered by limited viewpoint diversity and inherent modality discrepancies. To address these challenges, the first multimodal semantic segmentation dataset integrating light field data and point cloud data is proposed. Based on this dataset, we proposed a multi-modal light field point-cloud fusion segmentation network(Mlpfseg), incorporating feature completion and depth perception to segment both camera images and LiDAR point clouds simultaneously. The feature completion module addresses the density mismatch between point clouds and image pixels by performing differential reconstruction of point-cloud feature maps, enhancing the fusion of these modalities. The depth perception module improves the segmentation of occluded objects by reinforcing attention scores for better occlusion awareness. Our method outperforms image-only segmentation by 1.71 Mean Intersection over Union(mIoU) and point cloud-only segmentation by 2.38 mIoU, demonstrating its effectiveness.
