Table of Contents
Fetching ...

Multi-view Structural Convolution Network for Domain-Invariant Point Cloud Recognition of Autonomous Vehicles

Younggun Kim, Mohamed Abdel-Aty, Beomsik Cho, Seonghoon Ryoo, Soomok Lee

TL;DR

The paper tackles domain shifts in LiDAR-based point cloud recognition for autonomous vehicles by introducing MSCN, a network that extracts robust local and global geometric features via Structural Convolution Layers and Structural Aggregation Layers. It further strengthens domain invariance through unseen-domain generation using an adapted Progressive Domain Expansion framework, enabling training with synthetic domain variants. The approach achieves an average cross-domain accuracy of 82.0%, outperforming the PointTransformer baseline by 15.8%, with MSCN+ showing additional gains, and demonstrates real-time feasibility with fast inference on AV-scale data. Collectively, MSCN provides a practical path toward reliable domain-generalized perception in diverse sensing conditions and road environments.

Abstract

Point cloud representation has recently become a research hotspot in the field of computer vision and has been utilized for autonomous vehicles. However, adapting deep learning networks for point cloud data recognition is challenging due to the variability in datasets and sensor technologies. This variability underscores the necessity for adaptive techniques to maintain accuracy under different conditions. In this paper, we present the Multi-View Structural Convolution Network (MSCN) designed for domain-invariant point cloud recognition. MSCN comprises Structural Convolution Layers (SCL) that extract local context geometric features from point clouds and Structural Aggregation Layers (SAL) that extract and aggregate both local and overall context features from point clouds. Furthermore, MSCN enhances feature robustness by training with unseen domain point clouds generated from the source domain, enabling the model to acquire domain-invariant representations. Extensive cross-domain experiments demonstrate that MSCN achieves an average accuracy of 82.0%, surpassing the strong baseline PointTransformer by 15.8%, confirming its effectiveness under real-world domain shifts. Our code is available at https://github.com/MLMLab/MSCN.

Multi-view Structural Convolution Network for Domain-Invariant Point Cloud Recognition of Autonomous Vehicles

TL;DR

The paper tackles domain shifts in LiDAR-based point cloud recognition for autonomous vehicles by introducing MSCN, a network that extracts robust local and global geometric features via Structural Convolution Layers and Structural Aggregation Layers. It further strengthens domain invariance through unseen-domain generation using an adapted Progressive Domain Expansion framework, enabling training with synthetic domain variants. The approach achieves an average cross-domain accuracy of 82.0%, outperforming the PointTransformer baseline by 15.8%, with MSCN+ showing additional gains, and demonstrates real-time feasibility with fast inference on AV-scale data. Collectively, MSCN provides a practical path toward reliable domain-generalized perception in diverse sensing conditions and road environments.

Abstract

Point cloud representation has recently become a research hotspot in the field of computer vision and has been utilized for autonomous vehicles. However, adapting deep learning networks for point cloud data recognition is challenging due to the variability in datasets and sensor technologies. This variability underscores the necessity for adaptive techniques to maintain accuracy under different conditions. In this paper, we present the Multi-View Structural Convolution Network (MSCN) designed for domain-invariant point cloud recognition. MSCN comprises Structural Convolution Layers (SCL) that extract local context geometric features from point clouds and Structural Aggregation Layers (SAL) that extract and aggregate both local and overall context features from point clouds. Furthermore, MSCN enhances feature robustness by training with unseen domain point clouds generated from the source domain, enabling the model to acquire domain-invariant representations. Extensive cross-domain experiments demonstrate that MSCN achieves an average accuracy of 82.0%, surpassing the strong baseline PointTransformer by 15.8%, confirming its effectiveness under real-world domain shifts. Our code is available at https://github.com/MLMLab/MSCN.

Paper Structure

This paper contains 30 sections, 19 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Illustration of LiDAR-based recognition's three key challenges, including different sensor configurations, geographic locations, and sim-to-real gap.
  • Figure 2: The Multi-view Structural Convolution Network (MSCN) architecture for 3D point cloud classification comprises several components. (a) The MSCN includes feature extraction layers, specifically Structural Convolution Layers (SCL) and Structural Aggregation Layers (SAL), as well as Global Max Pooling and a multi-layer perceptron (MLP) for classification. (b) SCL is engineered to extract local features from each point. (c) The SAL is designed to combine local context features with overall context features.
  • Figure 3: The architecture for generating unseen domains in MSCN to learn more robust feature representations includes the following components. In (a), the modified MSCN is shown, consisting of a feature extractor (F), a classification head (C), and a projection head (P) with the projection head added for contrastive learning. In (b), the process of generating arbitrary unseen domains and training the MSCN with both source and generated data is illustrated. These generation and training processes are performed alternately.
  • Figure 4: Examples of geometric transformation on ShapeNetPart ShapeNet, where scale 10 represents a transformation where the original point cloud is scaled by a factor of 10, while shift 90 indicates a random translation of the point cloud by a distance of 90 units.
  • Figure 5: Evaluation of Geometrical Invariance Properties on ModelNet40 ModelNet40. Each graph illustrates the accuracy of various models under specific geometric transformations applied to point clouds. (a) Translation: Accuracy as the translation magnitude increases sequentially from 0 to 200 units. (b) Scaling: Accuracy as the scale factor increases sequentially from 0.01 to 100.
  • ...and 1 more figures