Table of Contents
Fetching ...

PolyDiff: Generating 3D Polygonal Meshes with Diffusion Models

Antonio Alliegro, Yawar Siddiqui, Tatiana Tommasi, Matthias Nießner

TL;DR

PolyDiff presents a novel discrete diffusion framework that directly generates 3D polygonal meshes by modeling meshes as quantized triangle soups and denoising with a transformer-based network. By operating in the discrete vertex-coordinate space, it jointly learns vertex positions and face topology without post-processing, achieving superior FID and JSD over prior methods on ShapeNet categories. The paper demonstrates strong quantitative gains, ablation results favoring discrete over continuous diffusion, and qualitative evidence of novel, coherent meshes, while outlining limitations such as scene-level generation and sampling speed. Overall, PolyDiff offers a practical, diffusion-based path to high-fidelity 3D meshes directly in the mesh domain, with potential to reduce artist workload in downstream pipelines.

Abstract

We introduce PolyDiff, the first diffusion-based approach capable of directly generating realistic and diverse 3D polygonal meshes. In contrast to methods that use alternate 3D shape representations (e.g. implicit representations), our approach is a discrete denoising diffusion probabilistic model that operates natively on the polygonal mesh data structure. This enables learning of both the geometric properties of vertices and the topological characteristics of faces. Specifically, we treat meshes as quantized triangle soups, progressively corrupted with categorical noise in the forward diffusion phase. In the reverse diffusion phase, a transformer-based denoising network is trained to revert the noising process, restoring the original mesh structure. At inference, new meshes can be generated by applying this denoising network iteratively, starting with a completely noisy triangle soup. Consequently, our model is capable of producing high-quality 3D polygonal meshes, ready for integration into downstream 3D workflows. Our extensive experimental analysis shows that PolyDiff achieves a significant advantage (avg. FID and JSD improvement of 18.2 and 5.8 respectively) over current state-of-the-art methods.

PolyDiff: Generating 3D Polygonal Meshes with Diffusion Models

TL;DR

PolyDiff presents a novel discrete diffusion framework that directly generates 3D polygonal meshes by modeling meshes as quantized triangle soups and denoising with a transformer-based network. By operating in the discrete vertex-coordinate space, it jointly learns vertex positions and face topology without post-processing, achieving superior FID and JSD over prior methods on ShapeNet categories. The paper demonstrates strong quantitative gains, ablation results favoring discrete over continuous diffusion, and qualitative evidence of novel, coherent meshes, while outlining limitations such as scene-level generation and sampling speed. Overall, PolyDiff offers a practical, diffusion-based path to high-fidelity 3D meshes directly in the mesh domain, with potential to reduce artist workload in downstream pipelines.

Abstract

We introduce PolyDiff, the first diffusion-based approach capable of directly generating realistic and diverse 3D polygonal meshes. In contrast to methods that use alternate 3D shape representations (e.g. implicit representations), our approach is a discrete denoising diffusion probabilistic model that operates natively on the polygonal mesh data structure. This enables learning of both the geometric properties of vertices and the topological characteristics of faces. Specifically, we treat meshes as quantized triangle soups, progressively corrupted with categorical noise in the forward diffusion phase. In the reverse diffusion phase, a transformer-based denoising network is trained to revert the noising process, restoring the original mesh structure. At inference, new meshes can be generated by applying this denoising network iteratively, starting with a completely noisy triangle soup. Consequently, our model is capable of producing high-quality 3D polygonal meshes, ready for integration into downstream 3D workflows. Our extensive experimental analysis shows that PolyDiff achieves a significant advantage (avg. FID and JSD improvement of 18.2 and 5.8 respectively) over current state-of-the-art methods.
Paper Structure (9 sections, 7 equations, 5 figures, 2 tables)

This paper contains 9 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: We propose PolyDiff, a novel 3D generative approach that operates natively on polygonal meshes. Meshes are treated as quantized triangle soups which are progressively corrupted with categorical noise. The process is then reverted by a transformer-based denoiser trained to restore the original mesh. PolyDiff is the first diffusion-based model able to generate realistic and diverse 3D polygonal meshes.
  • Figure 2: PolyDiff represents meshes as quantized triangle soups. Each mesh face is composed of three vertices, with each vertex being represented by a triplet of discrete coordinate values. For each mesh, the noising process incrementally alters the categorical values of the vertex coordinates over several timesteps. The noised version of the mesh is input to a transformer network which is tasked with predicting the clean, uncorrupted mesh at any given timestep. The training is driven by a cross-entropy loss between the network's predicted vertex coordinates and those of the original, uncorrupted mesh.
  • Figure 3: Novel shape generation vs nearest neighbor retrieval. Each row shows one sample generated by our PolyDiff (left, blue mesh) and the top-3 nearest neighbors (green) from the training set based on the Chamfer distance. The examples show that our method can generate novel shapes.
  • Figure 4: Qualitative comparison of generated meshes. The instances generated by PolyDiff are noticeably cleaner and more realistic than those of the competitors that show small or overlapping triangles, artifacts, and missing parts.
  • Figure 5: Qualitative comparison between our PolyDiff discrete diffusion model and a variant that employs standard continuous Gaussian noise. The continuous version exhibits noticeably poorer performance, affirming our hypothesis that discrete diffusion is better suited to the inherently discrete characteristics of mesh data.