Table of Contents
Fetching ...

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Ruowen Zhao, Junliang Ye, Zhengyi Wang, Guangce Liu, Yiwen Chen, Yikai Wang, Jun Zhu

TL;DR

DeepMesh tackles auto-regressive artist-mesh generation by addressing long token sequences and lack of human alignment. It introduces an efficient mesh tokenization with block-wise coordinate indexing, a data-curated, truncated-training pre-training pipeline, and a DPO-based RL loop to align outputs with human preferences. With 5k preference pairs and an Hourglass Transformer backbone, it achieves high-fidelity, topologically rich meshes conditioned on point clouds or images, surpassing state-of-the-art baselines in both geometry and aesthetics. This work presents a scalable, human-aligned path to high-detail 3D meshes suitable for diverse creative and practical applications.

Abstract

Triangle meshes play a crucial role in 3D applications for efficient manipulation and rendering. While auto-regressive methods generate structured meshes by predicting discrete vertex tokens, they are often constrained by limited face counts and mesh incompleteness. To address these challenges, we propose DeepMesh, a framework that optimizes mesh generation through two key innovations: (1) an efficient pre-training strategy incorporating a novel tokenization algorithm, along with improvements in data curation and processing, and (2) the introduction of Reinforcement Learning (RL) into 3D mesh generation to achieve human preference alignment via Direct Preference Optimization (DPO). We design a scoring standard that combines human evaluation with 3D metrics to collect preference pairs for DPO, ensuring both visual appeal and geometric accuracy. Conditioned on point clouds and images, DeepMesh generates meshes with intricate details and precise topology, outperforming state-of-the-art methods in both precision and quality. Project page: https://zhaorw02.github.io/DeepMesh/

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

TL;DR

DeepMesh tackles auto-regressive artist-mesh generation by addressing long token sequences and lack of human alignment. It introduces an efficient mesh tokenization with block-wise coordinate indexing, a data-curated, truncated-training pre-training pipeline, and a DPO-based RL loop to align outputs with human preferences. With 5k preference pairs and an Hourglass Transformer backbone, it achieves high-fidelity, topologically rich meshes conditioned on point clouds or images, surpassing state-of-the-art baselines in both geometry and aesthetics. This work presents a scalable, human-aligned path to high-detail 3D meshes suitable for diverse creative and practical applications.

Abstract

Triangle meshes play a crucial role in 3D applications for efficient manipulation and rendering. While auto-regressive methods generate structured meshes by predicting discrete vertex tokens, they are often constrained by limited face counts and mesh incompleteness. To address these challenges, we propose DeepMesh, a framework that optimizes mesh generation through two key innovations: (1) an efficient pre-training strategy incorporating a novel tokenization algorithm, along with improvements in data curation and processing, and (2) the introduction of Reinforcement Learning (RL) into 3D mesh generation to achieve human preference alignment via Direct Preference Optimization (DPO). We design a scoring standard that combines human evaluation with 3D metrics to collect preference pairs for DPO, ensuring both visual appeal and geometric accuracy. Conditioned on point clouds and images, DeepMesh generates meshes with intricate details and precise topology, outperforming state-of-the-art methods in both precision and quality. Project page: https://zhaorw02.github.io/DeepMesh/

Paper Structure

This paper contains 37 sections, 4 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Gallery of DeepMesh's generation results. DeepMesh efficiently generates aesthetic, artist-like meshes conditioned on the given point cloud.
  • Figure 2: An overview of our method. DeepMesh is an auto-regressive transformer composed of both self-attention and cross-attention layers. The model is pre-trained on discrete mesh tokens generated by our improved tokenization algorithm. To further enhance the quality of results, we propose a scoring standard that combines 3D metrics with human evaluation. With this standard, we annotate 5,000 preference pairs and then post-train the model with DPO to align its outputs with human preferences.
  • Figure 3: Distribution of face count in training dataset. We present the distribution of face counts in our training dataset. Our dataset size is approximately 500k, with an average face count of 8k.
  • Figure 4: Some examples of the collected preference pairs. We annotate the preferred meshes based on their geometry completeness, surface details and wireframe structure.
  • Figure 5: Qualitative comparison on point cloud conditioned generation between DeepMesh and baselines. DeepMesh outperforms baselines in both generated geometry and preservation of fine-grained details. The meshes generated by ours have much more faces than others.
  • ...and 11 more figures