Table of Contents
Fetching ...

PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion

Yichen Yang, Hong Li, Haodong Zhu, Linin Yang, Guojun Lei, Sheng Xu, Baochang Zhang

TL;DR

PartDiffuser introduces a semi-autoregressive diffusion framework for 3D mesh generation that decouples global topology and local geometry by performing autoregression between semantic parts and parallel diffusion within each part. The method uses hierarchical geometric conditioning from a point cloud and a Part-Aware Diffusion Block to dynamically guide generation, enabling high-fidelity local details while preserving correct global structure. Empirical results show significant improvements over state-of-the-art baselines, especially on complex datasets like Objaverse, with ablations confirming the value of combined global and part-specific conditioning. The work demonstrates strong practical potential for producing artist-level 3D meshes suitable for real-world applications and provides a dataset construction and efficiency analysis to support future research.

Abstract

Existing autoregressive (AR) methods for generating artist-designed meshes struggle to balance global structural consistency with high-fidelity local details, and are susceptible to error accumulation. To address this, we propose PartDiffuser, a novel semi-autoregressive diffusion framework for point-cloud-to-mesh generation. The method first performs semantic segmentation on the mesh and then operates in a "part-wise" manner: it employs autoregression between parts to ensure global topology, while utilizing a parallel discrete diffusion process within each semantic part to precisely reconstruct high-frequency geometric features. PartDiffuser is based on the DiT architecture and introduces a part-aware cross-attention mechanism, using point clouds as hierarchical geometric conditioning to dynamically control the generation process, thereby effectively decoupling the global and local generation tasks. Experiments demonstrate that this method significantly outperforms state-of-the-art (SOTA) models in generating 3D meshes with rich detail, exhibiting exceptional detail representation suitable for real-world applications.

PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion

TL;DR

PartDiffuser introduces a semi-autoregressive diffusion framework for 3D mesh generation that decouples global topology and local geometry by performing autoregression between semantic parts and parallel diffusion within each part. The method uses hierarchical geometric conditioning from a point cloud and a Part-Aware Diffusion Block to dynamically guide generation, enabling high-fidelity local details while preserving correct global structure. Empirical results show significant improvements over state-of-the-art baselines, especially on complex datasets like Objaverse, with ablations confirming the value of combined global and part-specific conditioning. The work demonstrates strong practical potential for producing artist-level 3D meshes suitable for real-world applications and provides a dataset construction and efficiency analysis to support future research.

Abstract

Existing autoregressive (AR) methods for generating artist-designed meshes struggle to balance global structural consistency with high-fidelity local details, and are susceptible to error accumulation. To address this, we propose PartDiffuser, a novel semi-autoregressive diffusion framework for point-cloud-to-mesh generation. The method first performs semantic segmentation on the mesh and then operates in a "part-wise" manner: it employs autoregression between parts to ensure global topology, while utilizing a parallel discrete diffusion process within each semantic part to precisely reconstruct high-frequency geometric features. PartDiffuser is based on the DiT architecture and introduces a part-aware cross-attention mechanism, using point clouds as hierarchical geometric conditioning to dynamically control the generation process, thereby effectively decoupling the global and local generation tasks. Experiments demonstrate that this method significantly outperforms state-of-the-art (SOTA) models in generating 3D meshes with rich detail, exhibiting exceptional detail representation suitable for real-world applications.

Paper Structure

This paper contains 31 sections, 5 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Gallery of our mesh generation results.
  • Figure 2: An overview of our PartDiffuser framework. The process begins with semantic segmentation of the input point-cloud using PartField liu2025partfield. A pre-trained point cloud encoder, Michelangelo zhao2023michelangelo, extracts hierarchical geometric conditions. These conditions are dynamically injected via cross-attention into the Part-aware Diffusion Blocks, which guides the semi-autoregressive "Part-wise Sampling" process of our Discrete Diffusion Model. to generate the final mesh.
  • Figure 3: Visualization of the composite attention mask during the parallel training phase, using $N=3$ parts as an example. This mask governs the attention mask described in Section \ref{['diffusion_block']}, managing interactions across the $N$ noisy blocks, denoted $X_t$, and the $N$ clean blocks, denoted $X_0$. The legend details the four distinct attention behaviors: standard allowance for mesh tokens, specific allowance for padding tokens, and blockage enforced by either the Block Diffusion Mask or the Block-Aware Padding Mask.
  • Figure 4: Visual comparison of PartDiffuser with Baselines.
  • Figure 5: An example of the ablation study.
  • ...and 3 more figures