Table of Contents
Fetching ...

DM3D: Deformable Mamba via Offset-Guided Gaussian Sequencing for Point Cloud Understanding

Bin Liu, Chunyang Wang, Xuelian Liu

TL;DR

This work tackles point-cloud understanding with long-sequence modeling by learning a structure-aware serialization for State Space Models. DM3D introduces an offset-guided deformable scanning mechanism that unifies local Gaussian resampling and global differentiable reordering, via Gaussian-based KNN Resampling (GKR), Gaussian-based Differentiable Reordering (GDR), and LCFA, within a Deformable Mamba Block. A Tri-Path Frequency Fusion module reconciles information across three SSM paths and mitigates aliasing through frequency-domain processing. Across ModelNet40, ScanObjectNN, and ShapeNetPart, DM3D achieves state-of-the-art performance in classification, few-shot learning, and part segmentation, while enabling end-to-end training and providing code release.

Abstract

State Space Models (SSMs) demonstrate significant potential for long-sequence modeling, but their reliance on input order conflicts with the irregular nature of point clouds. Existing approaches often rely on predefined serialization strategies, which cannot adjust based on diverse geometric structures. To overcome this limitation, we propose \textbf{DM3D}, a deformable Mamba architecture for point cloud understanding. Specifically, DM3D introduces an offset-guided Gaussian sequencing mechanism that unifies local resampling and global reordering within a deformable scan. The Gaussian-based KNN Resampling (GKR) enhances structural awareness by adaptively reorganizing neighboring points, while the Gaussian-based Differentiable Reordering (GDR) enables end-to-end optimization of serialization order. Furthermore, a Tri-Path Frequency Fusion module enhances feature complementarity and reduces aliasing. Together, these components enable structure-adaptive serialization of point clouds. Extensive experiments on benchmark datasets show that DM3D achieves state-of-the-art performance in classification, few-shot learning, and part segmentation, demonstrating that adaptive serialization effectively unlocks the potential of SSMs for point cloud understanding. The code will be released at https://github.com/L1277471578/DM3D.

DM3D: Deformable Mamba via Offset-Guided Gaussian Sequencing for Point Cloud Understanding

TL;DR

This work tackles point-cloud understanding with long-sequence modeling by learning a structure-aware serialization for State Space Models. DM3D introduces an offset-guided deformable scanning mechanism that unifies local Gaussian resampling and global differentiable reordering, via Gaussian-based KNN Resampling (GKR), Gaussian-based Differentiable Reordering (GDR), and LCFA, within a Deformable Mamba Block. A Tri-Path Frequency Fusion module reconciles information across three SSM paths and mitigates aliasing through frequency-domain processing. Across ModelNet40, ScanObjectNN, and ShapeNetPart, DM3D achieves state-of-the-art performance in classification, few-shot learning, and part segmentation, while enabling end-to-end training and providing code release.

Abstract

State Space Models (SSMs) demonstrate significant potential for long-sequence modeling, but their reliance on input order conflicts with the irregular nature of point clouds. Existing approaches often rely on predefined serialization strategies, which cannot adjust based on diverse geometric structures. To overcome this limitation, we propose \textbf{DM3D}, a deformable Mamba architecture for point cloud understanding. Specifically, DM3D introduces an offset-guided Gaussian sequencing mechanism that unifies local resampling and global reordering within a deformable scan. The Gaussian-based KNN Resampling (GKR) enhances structural awareness by adaptively reorganizing neighboring points, while the Gaussian-based Differentiable Reordering (GDR) enables end-to-end optimization of serialization order. Furthermore, a Tri-Path Frequency Fusion module enhances feature complementarity and reduces aliasing. Together, these components enable structure-adaptive serialization of point clouds. Extensive experiments on benchmark datasets show that DM3D achieves state-of-the-art performance in classification, few-shot learning, and part segmentation, demonstrating that adaptive serialization effectively unlocks the potential of SSMs for point cloud understanding. The code will be released at https://github.com/L1277471578/DM3D.

Paper Structure

This paper contains 16 sections, 23 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of our deformable scanning. The offset network predicts spatial offsets $\Delta p$ and sequential offsets $\Delta t$. Guided by the predicted offsets, a Gaussian kernel performs consistent local resampling and global reordering, yielding structure-aware sequences that capture fine-grained geometric details.
  • Figure 2: Overview of DM3D.(a) Overall architecture showing the embedding, encoder, and decoder structures. (b) The Deformable Mamba Block (DMB) consists of three SSM branches: the standard forward SSM PlainMamba (F-SSM) branch, the channel-flip backward SSM Mamba3D (C-SSM) branch, and the deformable SSM (D-SSM) branch. (c) Deformable Scan, the core of D-SSM, predicts spatial and sequential offsets via OffsetNet, enabling unified Gaussian-based resampling and differentiable reordering. "Spatial flow" and "Sequential flow" indicate operations in the spatial and sequence domains, respectively. (d) LCFA provides local contextual cues for the offset network. (e) GDR performs differentiable token reordering according to learned offsets. Symbols:$Cat$ and $Concat$ denote concatenation along the channel dimension, $\odot$ element-wise multiplication, $\oplus$ residual addition, $\otimes$ matrix multiplication, and $\sum$ summation.
  • Figure 3: Illustration of TPFF. Cross-path fusion is performed first, followed by frequency enhancement.
  • Figure 4: Visualization of the deformable mechanism. The token feature interaction matrix shows the feature weights from source tokens (x-axis) to target tokens (y-axis).
  • Figure 5: Visual results of part segmentation by DM3D and PointMamba on ShapeNetPart.