Learning to Adapt SAM for Segmenting Cross-domain Point Clouds
Xidong Peng, Runnan Chen, Feng Qiao, Lingdong Kong, Youquan Liu, Yujing Sun, Tai Wang, Xinge Zhu, Yuexin Ma
TL;DR
This paper tackles unsupervised domain adaptation for 3D LiDAR segmentation by aligning both source and target point features to the general feature space of the Vision Foundation Model SAM, using RGB images as an offline bridge to unify 2D-3D representations. It introduces a SAM-guided 3D feature alignment loss $L_{align}$ and a novel Scene-Instance Hybrid Feature Augmentation to generate diverse cross-domain point clouds, enhancing alignment with SAM features. The method, evaluated on multiple cross-domain benchmarks, achieves state-of-the-art performance with large gains over strong baselines, and ablations confirm the critical roles of the SAM-guided alignment, augmentation strategies, and integration of alternative VFMs. The approach demonstrates robust cross-domain generalization, reduces reliance on target-domain labels, and suggests broader applicability to challenging tasks such as panoptic segmentation and domain generalization, with potential extensions to 3D detection.
Abstract
Unsupervised domain adaptation (UDA) in 3D segmentation tasks presents a formidable challenge, primarily stemming from the sparse and unordered nature of point cloud data. Especially for LiDAR point clouds, the domain discrepancy becomes obvious across varying capture scenes, fluctuating weather conditions, and the diverse array of LiDAR devices in use. While previous UDA methodologies have often sought to mitigate this gap by aligning features between source and target domains, this approach falls short when applied to 3D segmentation due to the substantial domain variations. Inspired by the remarkable generalization capabilities exhibited by the vision foundation model, SAM, in the realm of image segmentation, our approach leverages the wealth of general knowledge embedded within SAM to unify feature representations across diverse 3D domains and further solves the 3D domain adaptation problem. Specifically, we harness the corresponding images associated with point clouds to facilitate knowledge transfer and propose an innovative hybrid feature augmentation methodology, which significantly enhances the alignment between the 3D feature space and SAM's feature space, operating at both the scene and instance levels. Our method is evaluated on many widely-recognized datasets and achieves state-of-the-art performance.
