Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale

Zekun Hao; David W. Romero; Tsung-Yi Lin; Ming-Yu Liu

Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale

Zekun Hao, David W. Romero, Tsung-Yi Lin, Ming-Yu Liu

TL;DR

<3-5 sentence high-level summary> Meshtron tackles the challenge of generating high-fidelity, artist-like 3D meshes at scale directly from point clouds. It introduces an hourglass Transformer backbone, truncated-sequence training with sliding-window inference, and a robust mesh-sequence ordering enforcement to scale up to 64k faces with 1024-level coordinate resolution. The model achieves substantial memory savings and throughput gains, while delivering superior topology, detail, and generalization compared with prior artist-like mesh generators and iso-surface methods. This work significantly advances AI-assisted 3D asset creation for games, film, and virtual environments by enabling realistic, controllable remeshing at unprecedented scales.

Abstract

Meshes are fundamental representations of 3D surfaces. However, creating high-quality meshes is a labor-intensive task that requires significant time and expertise in 3D modeling. While a delicate object often requires over $10^4$ faces to be accurately modeled, recent attempts at generating artist-like meshes are limited to $1.6$K faces and heavy discretization of vertex coordinates. Hence, scaling both the maximum face count and vertex coordinate resolution is crucial to producing high-quality meshes of realistic, complex 3D objects. We present Meshtron, a novel autoregressive mesh generation model able to generate meshes with up to 64K faces at 1024-level coordinate resolution --over an order of magnitude higher face count and $8{\times}$ higher coordinate resolution than current state-of-the-art methods. Meshtron's scalability is driven by four key components: (1) an hourglass neural architecture, (2) truncated sequence training, (3) sliding window inference, (4) a robust sampling strategy that enforces the order of mesh sequences. This results in over $50{\%}$ less training memory, $2.5{\times}$ faster throughput, and better consistency than existing works. Meshtron generates meshes of detailed, complex 3D objects at unprecedented levels of resolution and fidelity, closely resembling those created by professional artists, and opening the door to more realistic generation of detailed 3D assets for animation, gaming, and virtual environments.

Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale

TL;DR

Abstract

faces to be accurately modeled, recent attempts at generating artist-like meshes are limited to

K faces and heavy discretization of vertex coordinates. Hence, scaling both the maximum face count and vertex coordinate resolution is crucial to producing high-quality meshes of realistic, complex 3D objects. We present Meshtron, a novel autoregressive mesh generation model able to generate meshes with up to 64K faces at 1024-level coordinate resolution --over an order of magnitude higher face count and

higher coordinate resolution than current state-of-the-art methods. Meshtron's scalability is driven by four key components: (1) an hourglass neural architecture, (2) truncated sequence training, (3) sliding window inference, (4) a robust sampling strategy that enforces the order of mesh sequences. This results in over

less training memory,

faster throughput, and better consistency than existing works. Meshtron generates meshes of detailed, complex 3D objects at unprecedented levels of resolution and fidelity, closely resembling those created by professional artists, and opening the door to more realistic generation of detailed 3D assets for animation, gaming, and virtual environments.

Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale

TL;DR

Abstract

Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)