EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation

Jiaxiang Tang; Zhaoshuo Li; Zekun Hao; Xian Liu; Gang Zeng; Ming-Yu Liu; Qinsheng Zhang

EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation

Jiaxiang Tang, Zhaoshuo Li, Zekun Hao, Xian Liu, Gang Zeng, Ming-Yu Liu, Qinsheng Zhang

TL;DR

EdgeRunner introduces an auto-regressive auto-encoder (ArAE) with a novel EdgeBreaker-based mesh tokenizer to generate artistic meshes up to 4,000 faces at a resolution of $512^3$. By mapping variable-length meshes into a fixed-length latent space, it enables latent diffusion conditioned on point clouds or single-view images, improving generalization and cross-modal generation. The approach achieves higher quality and diversity than prior autoregressive mesh methods, with efficient training and competitive inference times. This work advances scalable, topology-preserving 3D mesh generation for downstream creative applications.

Abstract

Current auto-regressive mesh generation methods suffer from issues such as incompleteness, insufficient detail, and poor generalization. In this paper, we propose an Auto-regressive Auto-encoder (ArAE) model capable of generating high-quality 3D meshes with up to 4,000 faces at a spatial resolution of $512^3$. We introduce a novel mesh tokenization algorithm that efficiently compresses triangular meshes into 1D token sequences, significantly enhancing training efficiency. Furthermore, our model compresses variable-length triangular meshes into a fixed-length latent space, enabling training latent diffusion models for better generalization. Extensive experiments demonstrate the superior quality, diversity, and generalization capabilities of our model in both point cloud and image-conditioned mesh generation tasks.

EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation

TL;DR

EdgeRunner introduces an auto-regressive auto-encoder (ArAE) with a novel EdgeBreaker-based mesh tokenizer to generate artistic meshes up to 4,000 faces at a resolution of

. By mapping variable-length meshes into a fixed-length latent space, it enables latent diffusion conditioned on point clouds or single-view images, improving generalization and cross-modal generation. The approach achieves higher quality and diversity than prior autoregressive mesh methods, with efficient training and competitive inference times. This work advances scalable, topology-preserving 3D mesh generation for downstream creative applications.

Abstract

. We introduce a novel mesh tokenization algorithm that efficiently compresses triangular meshes into 1D token sequences, significantly enhancing training efficiency. Furthermore, our model compresses variable-length triangular meshes into a fixed-length latent space, enabling training latent diffusion models for better generalization. Extensive experiments demonstrate the superior quality, diversity, and generalization capabilities of our model in both point cloud and image-conditioned mesh generation tasks.

Paper Structure (25 sections, 4 equations, 15 figures, 2 tables, 1 algorithm)

This paper contains 25 sections, 4 equations, 15 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Optimization-based 3D Generation
Feed-forward 3D Generation
Diffusion-based 3D Generation
Auto-regressive Mesh Generation
EdgeRunner
Compact Mesh Tokenization
Auto-regressive Auto-encoder
Image-conditioned Latent Diffusion
Experiments
Qualitative Results
Quantitative Results
Ablation Studies
Conclusion
...and 10 more sections

Figures (15)

Figure 1: EdgeRunner efficiently generates diverse, high-quality artistic meshes conditioned on point clouds or single-view images.
Figure 2: Pipeline of our method. Our ArAE model compresses variable-length mesh into fixed-length latent code, which can be further used to train latent diffusion models conditioned on other input modalities, such as single-view images.
Figure 3: Illustration of our mesh tokenizer. Our tokenizer traverses the 3D mesh triangle-by-triangle and converts it into a 1D token sequence. Through edge sharing, we reach a compression rate of 50% (4 or 5 tokens per face on average) compared to naïve tokenization of 9 tokens per face.
Figure 4: Half-edge representation for triangular faces.
Figure 5: Comparison on point cloud conditioned generation. We show the reference dense mesh and generated meshes conditioned on randomly sampled point cloud.
...and 10 more figures

EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation

TL;DR

Abstract

EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (15)