Table of Contents
Fetching ...

PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model

Hao Yang, Qianyu Zhou, Haijia Sun, Xiangtai Li, Fengqi Liu, Xuequan Lu, Lizhuang Ma, Shuicheng Yan

TL;DR

This work targets domain generalization in point cloud classification by leveraging a generalized state-space model. It introduces PointDGMamba, a framework with three novel components—Masked Sequence Denoising (MSD), Sequence-wise Cross-domain Feature Aggregation (SCFA), and Dual-level Domain Scanning (DDS)—to achieve global receptive fields with linear complexity. A new multi-domain benchmark, PointDG-3to1, is proposed to better simulate real-world Domain Generalization (DG) scenarios. Extensive experiments on PointDA-10 and PointDG-3to1 demonstrate state-of-the-art generalization performance, supported by ablations and visual analyses that validate the efficacy of the design choices.

Abstract

Domain Generalization (DG) has been recently explored to improve the generalizability of point cloud classification (PCC) models toward unseen domains. However, they often suffer from limited receptive fields or quadratic complexity due to using convolution neural networks or vision Transformers. In this paper, we present the first work that studies the generalizability of state space models (SSMs) in DG PCC and find that directly applying SSMs into DG PCC will encounter several challenges: the inherent topology of the point cloud tends to be disrupted and leads to noise accumulation during the serialization stage. Besides, the lack of designs in domain-agnostic feature learning and data scanning will introduce unanticipated domain-specific information into the 3D sequence data. To this end, we propose a novel framework, PointDGMamba, that excels in strong generalizability toward unseen domains and has the advantages of global receptive fields and efficient linear complexity. PointDGMamba consists of three innovative components: Masked Sequence Denoising (MSD), Sequence-wise Cross-domain Feature Aggregation (SCFA), and Dual-level Domain Scanning (DDS). In particular, MSD selectively masks out the noised point tokens of the point cloud sequences, SCFA introduces cross-domain but same-class point cloud features to encourage the model to learn how to extract more generalized features. DDS includes intra-domain scanning and cross-domain scanning to facilitate information exchange between features. In addition, we propose a new and more challenging benchmark PointDG-3to1 for multi-domain generalization. Extensive experiments demonstrate the effectiveness and state-of-the-art performance of PointDGMamba.

PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model

TL;DR

This work targets domain generalization in point cloud classification by leveraging a generalized state-space model. It introduces PointDGMamba, a framework with three novel components—Masked Sequence Denoising (MSD), Sequence-wise Cross-domain Feature Aggregation (SCFA), and Dual-level Domain Scanning (DDS)—to achieve global receptive fields with linear complexity. A new multi-domain benchmark, PointDG-3to1, is proposed to better simulate real-world Domain Generalization (DG) scenarios. Extensive experiments on PointDA-10 and PointDG-3to1 demonstrate state-of-the-art generalization performance, supported by ablations and visual analyses that validate the efficacy of the design choices.

Abstract

Domain Generalization (DG) has been recently explored to improve the generalizability of point cloud classification (PCC) models toward unseen domains. However, they often suffer from limited receptive fields or quadratic complexity due to using convolution neural networks or vision Transformers. In this paper, we present the first work that studies the generalizability of state space models (SSMs) in DG PCC and find that directly applying SSMs into DG PCC will encounter several challenges: the inherent topology of the point cloud tends to be disrupted and leads to noise accumulation during the serialization stage. Besides, the lack of designs in domain-agnostic feature learning and data scanning will introduce unanticipated domain-specific information into the 3D sequence data. To this end, we propose a novel framework, PointDGMamba, that excels in strong generalizability toward unseen domains and has the advantages of global receptive fields and efficient linear complexity. PointDGMamba consists of three innovative components: Masked Sequence Denoising (MSD), Sequence-wise Cross-domain Feature Aggregation (SCFA), and Dual-level Domain Scanning (DDS). In particular, MSD selectively masks out the noised point tokens of the point cloud sequences, SCFA introduces cross-domain but same-class point cloud features to encourage the model to learn how to extract more generalized features. DDS includes intra-domain scanning and cross-domain scanning to facilitate information exchange between features. In addition, we propose a new and more challenging benchmark PointDG-3to1 for multi-domain generalization. Extensive experiments demonstrate the effectiveness and state-of-the-art performance of PointDGMamba.
Paper Structure (25 sections, 6 equations, 6 figures, 10 tables)

This paper contains 25 sections, 6 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Left: Comparisons between previous works and our PointDGMamba. Previous domain generalization (DG)-based point cloud classification (PCC) methods often rely on Convolution Neural Networks (CNNs) or Vision Transformers (ViTs) to learn domain-invariant features, while often suffering from limited receptive fields with local perception (a) or high quadratic complexity with global perception (b). Middle: In contrast, we propose a novel framework, PointDGMamba (c), that excels in strong generalizability toward unseen domains and has the advantages of global receptive fields and efficient linear complexity. Our PointDGMamba consists of Masked Sequence Denoising (MSD), Sequence-wise Cross-domain Feature Aggregation (SCFA), and Dual-level Domain Scanning (DDS). Right: In the widely-used PointDA-10 benchmark and our proposed PointDG-3to1 benchmark (d), our PointDGMamba demonstrates superior accuracy against state-of-the-art methods.
  • Figure 2: The framework of PointDGMamba. It consists of three components: (a) Masked Sequence Denoising (MSD) is presented to mask out noised point patches in the sequence and thus mitigate adverse effects of noise accumulation during the serialization stage; (b) Sequence-wise Cross-domain Feature Aggregation (SCFA) is introduced to aggregate cross-domain but same-class point cloud features with the global prompt to extract more generalized features, thereby strengthening Mamba's effectiveness in handling distribution shifts. (c) Dual-level Domain Scanning, including intra-domain scanning and cross-domain scanning, is proposed to facilitate sufficient information interaction between different parts of the features.
  • Figure 3: Dual-level Domain Scanning (DDS) comprises Intra-domain Scanning and Cross-domain Scanning.
  • Figure 4: Visualization of our PointDGMamba and other state-of-the-art methods using t-SNE, where they are tested on the ShapeNet-5(C) dataset of the PointDG-3to1 benchmark. Different colors represent different classes.
  • Figure 5: Visualization on the distributions of ablations of our PointDGMamba. We also visualize the source and target domains, with marker "×" representing the entire source domain and circles representing the target domain.
  • ...and 1 more figures