LACV-Net: Semantic Segmentation of Large-Scale Point Cloud Scene via Local Adaptive and Comprehensive VLAD
Ziyin Zeng, Yongyang Xu, Zhong Xie, Wei Tang, Jie Wan, Weichao Wu
TL;DR
The paper tackles the challenge of large-scale point cloud semantic segmentation, where down-sampling can erase local details and global context is hard to capture. It introduces LACV-Net, which combines Local Adaptive Feature Augmentation (LAFA) for robust local context with a Comprehensive VLAD (C-VLAD) for rich global descriptors, and adds an aggregation loss to sharpen boundaries and speed training. LAFA comprises a Local Information Encoding unit and an Adaptive Augmentation unit to encode multi-modal local cues and adaptively weigh neighboring points, while C-VLAD fuses multi-layer, multi-scale, and multi-resolution features into a unified global representation. Experiments on S3DIS, Toronto3D, and SensatUrban demonstrate state-of-the-art performance with favorable efficiency, confirming the method's effectiveness for scalable 3D scene understanding.
Abstract
Large-scale point cloud semantic segmentation is an important task in 3D computer vision, which is widely applied in autonomous driving, robotics, and virtual reality. Current large-scale point cloud semantic segmentation methods usually use down-sampling operations to improve computation efficiency and acquire point clouds with multi-resolution. However, this may cause the problem of missing local information. Meanwhile, it is difficult for networks to capture global information in large-scale distributed contexts. To capture local and global information effectively, we propose an end-to-end deep neural network called LACV-Net for large-scale point cloud semantic segmentation. The proposed network contains three main components: 1) a local adaptive feature augmentation module (LAFA) to adaptively learn the similarity of centroids and neighboring points to augment the local context; 2) a comprehensive VLAD module (C-VLAD) that fuses local features with multi-layer, multi-scale, and multi-resolution to represent a comprehensive global description vector; and 3) an aggregation loss function to effectively optimize the segmentation boundaries by constraining the adaptive weight from the LAFA module. Compared to state-of-the-art networks on several large-scale benchmark datasets, including S3DIS, Toronto3D, and SensatUrban, we demonstrated the effectiveness of the proposed network.
