Table of Contents
Fetching ...

MamTiff-CAD: Multi-Scale Latent Diffusion with Mamba+ for Complex Parametric Sequence

Liyuan Deng, Yunpeng Bai, Yongkang Dai, Xiaoshui Huang, Hongping Gan, Dongshuo Huang, Hao jiacheng, Yilei Shi

TL;DR

MamTiff-CAD tackles the challenge of generating long, complex parametric CAD command sequences by fusing a Forget Gate-enhanced Mamba+ encoder with a Transformer-based decoder and a latent Multi-Scale Transformer diffusion generator. The two-stage framework maps CAD sequences into a compact latent space, then learns their distribution with diffusion in that space, enabling robust reconstruction and high-quality unconditional generation for sequences up to $N_c=256$ commands. A new ABC-256 dataset with $13{,}705$ long sequences (60–256 commands) supports evaluation and demonstrates superior autoencoding and generation performance compared to prior CAD models, while generalizing to Fusion 360 data. These results advance industrial CAD design by enabling scalable, coherent generation of long parametric sequences and opening avenues for integrating Brep/CSG representations and interactive design workflows.

Abstract

Parametric Computer-Aided Design (CAD) is crucial in industrial applications, yet existing approaches often struggle to generate long sequence parametric commands due to complex CAD models' geometric and topological constraints. To address this challenge, we propose MamTiff-CAD, a novel CAD parametric command sequences generation framework that leverages a Transformer-based diffusion model for multi-scale latent representations. Specifically, we design a novel autoencoder that integrates Mamba+ and Transformer, to transfer parameterized CAD sequences into latent representations. The Mamba+ block incorporates a forget gate mechanism to effectively capture long-range dependencies. The non-autoregressive Transformer decoder reconstructs the latent representations. A diffusion model based on multi-scale Transformer is then trained on these latent embeddings to learn the distribution of long sequence commands. In addition, we also construct a dataset that consists of long parametric sequences, which is up to 256 commands for a single CAD model. Experiments demonstrate that MamTiff-CAD achieves state-of-the-art performance on both reconstruction and generation tasks, confirming its effectiveness for long sequence (60-256) CAD model generation.

MamTiff-CAD: Multi-Scale Latent Diffusion with Mamba+ for Complex Parametric Sequence

TL;DR

MamTiff-CAD tackles the challenge of generating long, complex parametric CAD command sequences by fusing a Forget Gate-enhanced Mamba+ encoder with a Transformer-based decoder and a latent Multi-Scale Transformer diffusion generator. The two-stage framework maps CAD sequences into a compact latent space, then learns their distribution with diffusion in that space, enabling robust reconstruction and high-quality unconditional generation for sequences up to commands. A new ABC-256 dataset with long sequences (60–256 commands) supports evaluation and demonstrates superior autoencoding and generation performance compared to prior CAD models, while generalizing to Fusion 360 data. These results advance industrial CAD design by enabling scalable, coherent generation of long parametric sequences and opening avenues for integrating Brep/CSG representations and interactive design workflows.

Abstract

Parametric Computer-Aided Design (CAD) is crucial in industrial applications, yet existing approaches often struggle to generate long sequence parametric commands due to complex CAD models' geometric and topological constraints. To address this challenge, we propose MamTiff-CAD, a novel CAD parametric command sequences generation framework that leverages a Transformer-based diffusion model for multi-scale latent representations. Specifically, we design a novel autoencoder that integrates Mamba+ and Transformer, to transfer parameterized CAD sequences into latent representations. The Mamba+ block incorporates a forget gate mechanism to effectively capture long-range dependencies. The non-autoregressive Transformer decoder reconstructs the latent representations. A diffusion model based on multi-scale Transformer is then trained on these latent embeddings to learn the distribution of long sequence commands. In addition, we also construct a dataset that consists of long parametric sequences, which is up to 256 commands for a single CAD model. Experiments demonstrate that MamTiff-CAD achieves state-of-the-art performance on both reconstruction and generation tasks, confirming its effectiveness for long sequence (60-256) CAD model generation.

Paper Structure

This paper contains 23 sections, 18 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Gallery of Generated CAD Designs. Our generative model infers parametric CAD command sequences, enabling the creation of diverse and structurally valid CAD models. The resulting 3D shapes exhibit clean geometry, well-defined features, and full editability, allowing users to modify designs seamlessly.
  • Figure 2: The framework of MamTiff-CAD consists of two main steps. In Step 1, an autoencoder integrating Mamba and Transformer architectures is used to learn the latent representation of CAD command sequences. In Step 2, a diffusion-based module is employed to model the generative distribution of the learned latent representations. During inference, the model directly generates latent variables, which are subsequently decoded to reconstruct the CAD command sequences.
  • Figure 3: An overview of our autoencoder architecture. The input CAD command sequence is first parameterized and fused with command, parameter, and positional embeddings. It is then processed through four Mamba+ blocks and mapped to the latent representation $Z$ via a compression bolck. Finally, the Transformer decoder reconstructs the CAD sequence, predicting command types and their corresponding parameters.
  • Figure 4: Our denoising diffusion model. The input noise undergoes linear projection with positional and timestep embeddings, followed by feature extraction through encoder blocks (×6). The core denoising structure, MST with Adaptive Fusion, integrates multi-scale attention and adaptive fusion to dynamically adjust feature distributions. Finally, the MLP Head reconstructs the denoised output features.
  • Figure 5: This figure compares the CAD models generated by DeepCAD, SkexGen, HNC-CAD, and MamTiff-CAD, highlighting differences in shape complexity and geometric details among the methods. It can be observed that MamTiff-CAD produces models with higher structural integrity and greater complexity.
  • ...and 6 more figures