Table of Contents
Fetching ...

LACV-Net: Semantic Segmentation of Large-Scale Point Cloud Scene via Local Adaptive and Comprehensive VLAD

Ziyin Zeng, Yongyang Xu, Zhong Xie, Wei Tang, Jie Wan, Weichao Wu

TL;DR

The paper tackles the challenge of large-scale point cloud semantic segmentation, where down-sampling can erase local details and global context is hard to capture. It introduces LACV-Net, which combines Local Adaptive Feature Augmentation (LAFA) for robust local context with a Comprehensive VLAD (C-VLAD) for rich global descriptors, and adds an aggregation loss to sharpen boundaries and speed training. LAFA comprises a Local Information Encoding unit and an Adaptive Augmentation unit to encode multi-modal local cues and adaptively weigh neighboring points, while C-VLAD fuses multi-layer, multi-scale, and multi-resolution features into a unified global representation. Experiments on S3DIS, Toronto3D, and SensatUrban demonstrate state-of-the-art performance with favorable efficiency, confirming the method's effectiveness for scalable 3D scene understanding.

Abstract

Large-scale point cloud semantic segmentation is an important task in 3D computer vision, which is widely applied in autonomous driving, robotics, and virtual reality. Current large-scale point cloud semantic segmentation methods usually use down-sampling operations to improve computation efficiency and acquire point clouds with multi-resolution. However, this may cause the problem of missing local information. Meanwhile, it is difficult for networks to capture global information in large-scale distributed contexts. To capture local and global information effectively, we propose an end-to-end deep neural network called LACV-Net for large-scale point cloud semantic segmentation. The proposed network contains three main components: 1) a local adaptive feature augmentation module (LAFA) to adaptively learn the similarity of centroids and neighboring points to augment the local context; 2) a comprehensive VLAD module (C-VLAD) that fuses local features with multi-layer, multi-scale, and multi-resolution to represent a comprehensive global description vector; and 3) an aggregation loss function to effectively optimize the segmentation boundaries by constraining the adaptive weight from the LAFA module. Compared to state-of-the-art networks on several large-scale benchmark datasets, including S3DIS, Toronto3D, and SensatUrban, we demonstrated the effectiveness of the proposed network.

LACV-Net: Semantic Segmentation of Large-Scale Point Cloud Scene via Local Adaptive and Comprehensive VLAD

TL;DR

The paper tackles the challenge of large-scale point cloud semantic segmentation, where down-sampling can erase local details and global context is hard to capture. It introduces LACV-Net, which combines Local Adaptive Feature Augmentation (LAFA) for robust local context with a Comprehensive VLAD (C-VLAD) for rich global descriptors, and adds an aggregation loss to sharpen boundaries and speed training. LAFA comprises a Local Information Encoding unit and an Adaptive Augmentation unit to encode multi-modal local cues and adaptively weigh neighboring points, while C-VLAD fuses multi-layer, multi-scale, and multi-resolution features into a unified global representation. Experiments on S3DIS, Toronto3D, and SensatUrban demonstrate state-of-the-art performance with favorable efficiency, confirming the method's effectiveness for scalable 3D scene understanding.

Abstract

Large-scale point cloud semantic segmentation is an important task in 3D computer vision, which is widely applied in autonomous driving, robotics, and virtual reality. Current large-scale point cloud semantic segmentation methods usually use down-sampling operations to improve computation efficiency and acquire point clouds with multi-resolution. However, this may cause the problem of missing local information. Meanwhile, it is difficult for networks to capture global information in large-scale distributed contexts. To capture local and global information effectively, we propose an end-to-end deep neural network called LACV-Net for large-scale point cloud semantic segmentation. The proposed network contains three main components: 1) a local adaptive feature augmentation module (LAFA) to adaptively learn the similarity of centroids and neighboring points to augment the local context; 2) a comprehensive VLAD module (C-VLAD) that fuses local features with multi-layer, multi-scale, and multi-resolution to represent a comprehensive global description vector; and 3) an aggregation loss function to effectively optimize the segmentation boundaries by constraining the adaptive weight from the LAFA module. Compared to state-of-the-art networks on several large-scale benchmark datasets, including S3DIS, Toronto3D, and SensatUrban, we demonstrated the effectiveness of the proposed network.
Paper Structure (31 sections, 13 equations, 10 figures, 11 tables)

This paper contains 31 sections, 13 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Architecture of the proposed network.
  • Figure 2: Proposed local adaptive feature augmentation module (LAFA).
  • Figure 3: Schematic with and without adaptive weight. The Fig.3-(a) indicates without adaptive weighting unit, and the Fig.3-(b) indicates with adaptive weighting unit.
  • Figure 4: Proposed comprehensive VLAD module (C-VLAD).
  • Figure 5: Visual comparison of semantic segmentation results on S3DIS dataset.
  • ...and 5 more figures