Scaling Mesh Generation via Compressive Tokenization
Haohan Weng, Zibo Zhao, Biwen Lei, Xianghui Yang, Jian Liu, Zeqiang Lai, Zhuo Chen, Yuhong Liu, Jie Jiang, Chunchao Guo, Tong Zhang, Shenghua Gao, C. L. Philip Chen
TL;DR
This work introduces Blocked and Patchified Tokenization (BPT), a compressive mesh representation that reduces mesh sequence length by about 75% through block-wise indexing and patch-based aggregation, enabling training on meshes with over 8k faces. Coupled with a foundation autoregressive Transformer conditioned on point clouds and images, the approach achieves state-of-the-art performance for point-cloud to mesh generation and produces detailed meshes aligned with input images. The paper demonstrates that scaling training data with high-poly meshes improves generation quality and robustness, and provides extensive ablations on tokenization parameters and data usage. Overall, BPT unlocks scalable, high-detail native mesh generation and broadens practical applications for 3D asset creation.
Abstract
We propose a compressive yet effective mesh representation, Blocked and Patchified Tokenization (BPT), facilitating the generation of meshes exceeding 8k faces. BPT compresses mesh sequences by employing block-wise indexing and patch aggregation, reducing their length by approximately 75\% compared to the original sequences. This compression milestone unlocks the potential to utilize mesh data with significantly more faces, thereby enhancing detail richness and improving generation robustness. Empowered with the BPT, we have built a foundation mesh generative model training on scaled mesh data to support flexible control for point clouds and images. Our model demonstrates the capability to generate meshes with intricate details and accurate topology, achieving SoTA performance on mesh generation and reaching the level for direct product usage.
