Table of Contents
Fetching ...

Scaling Mesh Generation via Compressive Tokenization

Haohan Weng, Zibo Zhao, Biwen Lei, Xianghui Yang, Jian Liu, Zeqiang Lai, Zhuo Chen, Yuhong Liu, Jie Jiang, Chunchao Guo, Tong Zhang, Shenghua Gao, C. L. Philip Chen

TL;DR

This work introduces Blocked and Patchified Tokenization (BPT), a compressive mesh representation that reduces mesh sequence length by about 75% through block-wise indexing and patch-based aggregation, enabling training on meshes with over 8k faces. Coupled with a foundation autoregressive Transformer conditioned on point clouds and images, the approach achieves state-of-the-art performance for point-cloud to mesh generation and produces detailed meshes aligned with input images. The paper demonstrates that scaling training data with high-poly meshes improves generation quality and robustness, and provides extensive ablations on tokenization parameters and data usage. Overall, BPT unlocks scalable, high-detail native mesh generation and broadens practical applications for 3D asset creation.

Abstract

We propose a compressive yet effective mesh representation, Blocked and Patchified Tokenization (BPT), facilitating the generation of meshes exceeding 8k faces. BPT compresses mesh sequences by employing block-wise indexing and patch aggregation, reducing their length by approximately 75\% compared to the original sequences. This compression milestone unlocks the potential to utilize mesh data with significantly more faces, thereby enhancing detail richness and improving generation robustness. Empowered with the BPT, we have built a foundation mesh generative model training on scaled mesh data to support flexible control for point clouds and images. Our model demonstrates the capability to generate meshes with intricate details and accurate topology, achieving SoTA performance on mesh generation and reaching the level for direct product usage.

Scaling Mesh Generation via Compressive Tokenization

TL;DR

This work introduces Blocked and Patchified Tokenization (BPT), a compressive mesh representation that reduces mesh sequence length by about 75% through block-wise indexing and patch-based aggregation, enabling training on meshes with over 8k faces. Coupled with a foundation autoregressive Transformer conditioned on point clouds and images, the approach achieves state-of-the-art performance for point-cloud to mesh generation and produces detailed meshes aligned with input images. The paper demonstrates that scaling training data with high-poly meshes improves generation quality and robustness, and provides extensive ablations on tokenization parameters and data usage. Overall, BPT unlocks scalable, high-detail native mesh generation and broadens practical applications for 3D asset creation.

Abstract

We propose a compressive yet effective mesh representation, Blocked and Patchified Tokenization (BPT), facilitating the generation of meshes exceeding 8k faces. BPT compresses mesh sequences by employing block-wise indexing and patch aggregation, reducing their length by approximately 75\% compared to the original sequences. This compression milestone unlocks the potential to utilize mesh data with significantly more faces, thereby enhancing detail richness and improving generation robustness. Empowered with the BPT, we have built a foundation mesh generative model training on scaled mesh data to support flexible control for point clouds and images. Our model demonstrates the capability to generate meshes with intricate details and accurate topology, achieving SoTA performance on mesh generation and reaching the level for direct product usage.

Paper Structure

This paper contains 42 sections, 6 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: Generated meshes conditioned on images or point cloud sampled from dense meshes. Our model can generate meshes up to 8k faces based on the proposed compressive tokenization. The lower right dense meshes or images represent the conditions.
  • Figure 2: Scaling data for mesh generation with BPT. (a) Existing models can only handle meshes with at most 4k faces, which still lack intricate details. Empowered by BPT, our model can leverage meshes exceeding 8k faces, effectively extending the training scope for mesh generation. (b) We train the same model on meshes with different maximum numbers of faces. As the number of mesh faces increases, the performance of mesh generation significantly improves, highlighting the value of high-poly training data. The generation performance is measured by the Hausdorff distance between the input point cloud and generated meshes (a lower distance indicates a better performance).
  • Figure 3: The proposed Blocked and Patchified Tokenization (BPT). (a) We convert the coordinates from the Cartesian system to block-wise indexes. The coordinates are first separated equally into several blocks. Then, vertices inside each block are located with 1-dim indexes. (b) The nearby faces are aggregated as patches to compress the mesh sequence. Each patch center is set as the vertex connected with the most unvisited faces. Subsequently, other vertices within the patch are included in the subsequence to create a complete patch.
  • Figure 4: The average vertex distance with previous $t$ vertices (denoted as AVD@t, lower is better). Among various context lengths $t$, BPT achieves the lowest AVD, showing its locality for effective mesh modeling.
  • Figure 5: Comparision on point-cloud conditional generation. All the meshes are generated conditioned on the point cloud sampled from dense meshes. Our model can recover the details of dense meshes while maintaining high-quality topology.
  • ...and 6 more figures