Table of Contents
Fetching ...

MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization

Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, Guosheng Lin

TL;DR

MeshAnything V2 tackles the tokenization bottleneck in autoregressive Artist-Created Mesh generation by introducing Adjacent Mesh Tokenization (AMT), which represents faces with a single vertex whenever possible and uses an interruption token to handle non-adjacent cases. Integrated into a shape-conditioned decoder-only transformer, AMT significantly reduces token sequence length (about half on average) and doubles the maximum selectable mesh faces to 1600, improving both efficiency and generation quality on Objaverse data. The approach includes enhancements such as a face count condition and masking invalid predictions to improve controllability and robustness. Extensive experiments demonstrate strong qualitative and quantitative gains over prior tokenization schemes, validating AMT as a core driver of scalable, shape-aligned AM generation.

Abstract

Meshes are the de facto 3D representation in the industry but are labor-intensive to produce. Recently, a line of research has focused on autoregressively generating meshes. This approach processes meshes into a sequence composed of vertices and then generates them vertex by vertex, similar to how a language model generates text. These methods have achieved some success but still struggle to generate complex meshes. One primary reason for this limitation is their inefficient tokenization methods. To address this issue, we introduce MeshAnything V2, an advanced mesh generation model designed to create Artist-Created Meshes that align precisely with specified shapes. A key innovation behind MeshAnything V2 is our novel Adjacent Mesh Tokenization (AMT) method. Unlike traditional approaches that represent each face using three vertices, AMT optimizes this by employing a single vertex wherever feasible, effectively reducing the token sequence length by about half on average. This not only streamlines the tokenization process but also results in more compact and well-structured sequences, enhancing the efficiency of mesh generation. With these improvements, MeshAnything V2 effectively doubles the face limit compared to previous models, delivering superior performance without increasing computational costs. We will make our code and models publicly available. Project Page: https://buaacyw.github.io/meshanything-v2/

MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization

TL;DR

MeshAnything V2 tackles the tokenization bottleneck in autoregressive Artist-Created Mesh generation by introducing Adjacent Mesh Tokenization (AMT), which represents faces with a single vertex whenever possible and uses an interruption token to handle non-adjacent cases. Integrated into a shape-conditioned decoder-only transformer, AMT significantly reduces token sequence length (about half on average) and doubles the maximum selectable mesh faces to 1600, improving both efficiency and generation quality on Objaverse data. The approach includes enhancements such as a face count condition and masking invalid predictions to improve controllability and robustness. Extensive experiments demonstrate strong qualitative and quantitative gains over prior tokenization schemes, validating AMT as a core driver of scalable, shape-aligned AM generation.

Abstract

Meshes are the de facto 3D representation in the industry but are labor-intensive to produce. Recently, a line of research has focused on autoregressively generating meshes. This approach processes meshes into a sequence composed of vertices and then generates them vertex by vertex, similar to how a language model generates text. These methods have achieved some success but still struggle to generate complex meshes. One primary reason for this limitation is their inefficient tokenization methods. To address this issue, we introduce MeshAnything V2, an advanced mesh generation model designed to create Artist-Created Meshes that align precisely with specified shapes. A key innovation behind MeshAnything V2 is our novel Adjacent Mesh Tokenization (AMT) method. Unlike traditional approaches that represent each face using three vertices, AMT optimizes this by employing a single vertex wherever feasible, effectively reducing the token sequence length by about half on average. This not only streamlines the tokenization process but also results in more compact and well-structured sequences, enhancing the efficiency of mesh generation. With these improvements, MeshAnything V2 effectively doubles the face limit compared to previous models, delivering superior performance without increasing computational costs. We will make our code and models publicly available. Project Page: https://buaacyw.github.io/meshanything-v2/
Paper Structure (17 sections, 4 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 4 equations, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: Equipped with the newly proposed Adjacent Mesh Tokenization (AMT), MeshAnything V2 significantly surpasses MeshAnything chen2024meshanything in both performance and efficiency. MeshAnything V2 generates Artist-Created Meshes (AM) up to $1600$ faces aligned with given shapes. Combined with various 3D asset production pipelines, it efficiently achieves high-quality, highly controllable AM generation.
  • Figure 2: Illustration of Adjacent Mesh Tokenization (AMT). Unlike previous methods that use three vertices to represent a face, AMT uses a single vertex whenever possible. When this is impossible, AMT adds a special token & and restarts. Our experiments demonstrate that AMT reduces the token sequence length by half on average. Its compact, and well-structured sequence representation enhances sequence learning, thereby significantly improving both the efficiency and performance of mesh generation.