Table of Contents
Fetching ...

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

Stefan Lionar, Jiabin Liang, Gim Hee Lee

TL;DR

TreeMeshGPT tackles the challenge of synthesizing artist-quality 3D meshes conditioned on point clouds by introducing Autoregressive Tree Sequencing, a DFS-based, dynamically growing tree traversal that retrieves the next token from triangle adjacency rather than predicting the next token in a flat sequence. Representing each triangular face with two tokens, and using a $7$-bit discretization, the model achieves a compression of about $22\%$ and scales to roughly $5{,}500$ faces under a strong $2{,}048$ point-token conditioning. Empirically, it outperforms previous autoregressive approaches in both fidelity and normal orientation, with lower CD and higher NC/|NC| on Objaverse and GSO datasets. The approach enables higher-detail artistic meshes suitable for real-time applications, though it acknowledges limitations around topology optimization and longer-sequence failure modes.

Abstract

We introduce TreeMeshGPT, an autoregressive Transformer designed to generate high-quality artistic meshes aligned with input point clouds. Instead of the conventional next-token prediction in autoregressive Transformer, we propose a novel Autoregressive Tree Sequencing where the next input token is retrieved from a dynamically growing tree structure that is built upon the triangle adjacency of faces within the mesh. Our sequencing enables the mesh to extend locally from the last generated triangular face at each step, and therefore reduces training difficulty and improves mesh quality. Our approach represents each triangular face with two tokens, achieving a compression rate of approximately 22% compared to the naive face tokenization. This efficient tokenization enables our model to generate highly detailed artistic meshes with strong point cloud conditioning, surpassing previous methods in both capacity and fidelity. Furthermore, our method generates mesh with strong normal orientation constraints, minimizing flipped normals commonly encountered in previous methods. Our experiments show that TreeMeshGPT enhances the mesh generation quality with refined details and normal orientation consistency.

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

TL;DR

TreeMeshGPT tackles the challenge of synthesizing artist-quality 3D meshes conditioned on point clouds by introducing Autoregressive Tree Sequencing, a DFS-based, dynamically growing tree traversal that retrieves the next token from triangle adjacency rather than predicting the next token in a flat sequence. Representing each triangular face with two tokens, and using a -bit discretization, the model achieves a compression of about and scales to roughly faces under a strong point-token conditioning. Empirically, it outperforms previous autoregressive approaches in both fidelity and normal orientation, with lower CD and higher NC/|NC| on Objaverse and GSO datasets. The approach enables higher-detail artistic meshes suitable for real-time applications, though it acknowledges limitations around topology optimization and longer-sequence failure modes.

Abstract

We introduce TreeMeshGPT, an autoregressive Transformer designed to generate high-quality artistic meshes aligned with input point clouds. Instead of the conventional next-token prediction in autoregressive Transformer, we propose a novel Autoregressive Tree Sequencing where the next input token is retrieved from a dynamically growing tree structure that is built upon the triangle adjacency of faces within the mesh. Our sequencing enables the mesh to extend locally from the last generated triangular face at each step, and therefore reduces training difficulty and improves mesh quality. Our approach represents each triangular face with two tokens, achieving a compression rate of approximately 22% compared to the naive face tokenization. This efficient tokenization enables our model to generate highly detailed artistic meshes with strong point cloud conditioning, surpassing previous methods in both capacity and fidelity. Furthermore, our method generates mesh with strong normal orientation constraints, minimizing flipped normals commonly encountered in previous methods. Our experiments show that TreeMeshGPT enhances the mesh generation quality with refined details and normal orientation consistency.

Paper Structure

This paper contains 20 sections, 13 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Artistic meshes generated by TreeMeshGPT. Our method offers a novel sequencing approach for artistic mesh generation using autoregressive Transformer decoder by retrieving the next token from a dynamically growing tree structure. In our experiment with 7-bit discretization, TreeMeshGPT supports meshes with up to 5,500 triangular faces under strong point cloud conditioning.
  • Figure 2: Illustration of the sequence order in our Autoregressive Tree sequencing. a). A small subset of a triangular mesh.[STOP] indicates boundary. b). An equivalent tree representation of the mesh. In this tree, each node is represented as a directed edge from a pair of vertices. The root is initialized with two child nodes: $(v_0, v_1)$ and its twin $(v_1, v_0)$. A DFS traversal then proceeds to create the input-output sequence. c). Dynamic stack from the DFS traversal. The stack is initialized with $(v_0, v_1)$ and its twin $(v_1, v_0)$. The input $I_n$ is always obtained from the top of the stack. Thus, $I_1 = (v_0, v_1)$ at step 1. The opposite vertex of $I_1$ is $v_2$ and consequently, $o_1$ is set to $v_2$. Two new edges are then formed by connecting the opposite vertex to the initial pair of vertices: $(v_2, v_1)$ and $(v_0, v_2)$. The direction is enforced to be counter-clockwise on the potential next adjacent faces. At step 2,$I_2 = (v_2, v_1)$. Since $I_2$ is a boundary, $o_2$ is set to [STOP] label and no new edge is added to the stack. Step 3 and onwards follow the same traversal process. d). Transformer decoder sequence. The sequence in the Transformer decoder follows the input-output pairs from the tree traversal. The auxiliary tokens to initialize the generation of a connected component and the [EOS] are also added to the input-output sequence.
  • Figure 3: Qualitative comparison on Objaverse dataset deitke2023objaverse. Our model is able to generate meshes with higher face counts and refined details compared to the baselines. Results from the baselines use point clouds sampled from marching cube meshes with 8-level octree.
  • Figure 4: Qualitative comparison on GSO dataset downs2022google.
  • Figure 5: Comparison between the decimated mesh and our output. Our model is capable of generating meshes with the topology of those created by 3D artists.
  • ...and 5 more figures