Table of Contents
Fetching ...

DG-MVP: 3D Domain Generalization via Multiple Views of Point Clouds for Classification

Huantao Ren, Minmin Yang, Senem Velipasalar

TL;DR

3D point cloud domain generalization is challenged by missing points and occlusion when transferring from CAD-sampled data to real-world scans. The authors propose DG-MVP, which replaces point-based backbones with six depth projections of a point cloud, uses Depth Pooling across views, and employs a Multi-scale Max Pooling head with a two-branch training loss, plus transformations to simulate real-world variations. They demonstrate state-of-the-art DG performance on PointDA-10 and Sim-to-Real benchmarks, often surpassing methods that use target-domain data. The work highlights the importance of multi-view depth representations and robust feature pooling for cross-domain generalization in 3D vision, offering a practical approach for synthetic-to-real classification without target data.

Abstract

Deep neural networks have achieved significant success in 3D point cloud classification while relying on large-scale, annotated point cloud datasets, which are labor-intensive to build. Compared to capturing data with LiDAR sensors and then performing annotation, it is relatively easier to sample point clouds from CAD models. Yet, data sampled from CAD models is regular, and does not suffer from occlusion and missing points, which are very common for LiDAR data, creating a large domain shift. Therefore, it is critical to develop methods that can generalize well across different point cloud domains. %In this paper, we focus on the 3D point cloud domain generalization problem. Existing 3D domain generalization methods employ point-based backbones to extract point cloud features. Yet, by analyzing point utilization of point-based methods and observing the geometry of point clouds from different domains, we have found that a large number of point features are discarded by point-based methods through the max-pooling operation. This is a significant waste especially considering the fact that domain generalization is more challenging than supervised learning, and point clouds are already affected by missing points and occlusion to begin with. To address these issues, we propose a novel method for 3D point cloud domain generalization, which can generalize to unseen domains of point clouds. Our proposed method employs multiple 2D projections of a 3D point cloud to alleviate the issue of missing points and involves a simple yet effective convolution-based model to extract features. The experiments, performed on the PointDA-10 and Sim-to-Real benchmarks, demonstrate the effectiveness of our proposed method, which outperforms different baselines, and can transfer well from synthetic domain to real-world domain.

DG-MVP: 3D Domain Generalization via Multiple Views of Point Clouds for Classification

TL;DR

3D point cloud domain generalization is challenged by missing points and occlusion when transferring from CAD-sampled data to real-world scans. The authors propose DG-MVP, which replaces point-based backbones with six depth projections of a point cloud, uses Depth Pooling across views, and employs a Multi-scale Max Pooling head with a two-branch training loss, plus transformations to simulate real-world variations. They demonstrate state-of-the-art DG performance on PointDA-10 and Sim-to-Real benchmarks, often surpassing methods that use target-domain data. The work highlights the importance of multi-view depth representations and robust feature pooling for cross-domain generalization in 3D vision, offering a practical approach for synthetic-to-real classification without target data.

Abstract

Deep neural networks have achieved significant success in 3D point cloud classification while relying on large-scale, annotated point cloud datasets, which are labor-intensive to build. Compared to capturing data with LiDAR sensors and then performing annotation, it is relatively easier to sample point clouds from CAD models. Yet, data sampled from CAD models is regular, and does not suffer from occlusion and missing points, which are very common for LiDAR data, creating a large domain shift. Therefore, it is critical to develop methods that can generalize well across different point cloud domains. %In this paper, we focus on the 3D point cloud domain generalization problem. Existing 3D domain generalization methods employ point-based backbones to extract point cloud features. Yet, by analyzing point utilization of point-based methods and observing the geometry of point clouds from different domains, we have found that a large number of point features are discarded by point-based methods through the max-pooling operation. This is a significant waste especially considering the fact that domain generalization is more challenging than supervised learning, and point clouds are already affected by missing points and occlusion to begin with. To address these issues, we propose a novel method for 3D point cloud domain generalization, which can generalize to unseen domains of point clouds. Our proposed method employs multiple 2D projections of a 3D point cloud to alleviate the issue of missing points and involves a simple yet effective convolution-based model to extract features. The experiments, performed on the PointDA-10 and Sim-to-Real benchmarks, demonstrate the effectiveness of our proposed method, which outperforms different baselines, and can transfer well from synthetic domain to real-world domain.

Paper Structure

This paper contains 22 sections, 1 equation, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Different domain shifts observed with 2D images and 3D point clouds.
  • Figure 2: Classification accuracy versus the number of points retained after max-pooling for PointNet++, DGCNN, and GDANet on PointDA-10.
  • Figure 3: Example point clouds for chairs and their corresponding depth images for ModelNet and ScanNet datasets.
  • Figure 4: The pipeline of DG-MVP. $DT$, $DP$, $MMP$, $AVP$ and $FC$ represent Data Transformation, Depth Pooling, Multi-scale Max Pooling, Average Pooling and Fully Connected Layer, respectively. $H$, $W$, $C$, $P$ and $N$ denote height, weight, the number of channels, the total number of strips and the number of class, respectively.
  • Figure 5: Examples of augmented point clouds via creating a hole and non-uniform point density.
  • ...and 3 more figures