Table of Contents
Fetching ...

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, Tao Chen

TL;DR

MeshXL tackles direct generation of high-fidelity 3D meshes by representing meshes as auto-regressive sequences through NeurCF, an explicit coordinate field with implicit embeddings. Using decoder-only transformers trained on next-coordinate prediction, MeshXL scales across 125M–1.3B parameters and leverages a large 2.5M-mesh corpus with about $150$ billion tokens, including image/text conditioning via a fixed $32$-token prefix. Empirical results on ShapeNet and Objaverse show MeshXL surpasses prior methods in diversity and quality, with human studies favoring MeshXL outputs and successful $\mathcal{X}$-to-Mesh and texture generation capabilities. Together, NeurCF and MeshXL establish a scalable 3D foundation-model framework for conditional mesh generation and downstream content creation, while acknowledging inference-time limitations and directions for speedups.

Abstract

The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation process can be seamlessly treated as an auto-regressive problem. In this paper, we validate the Neural Coordinate Field (NeurCF), an explicit coordinate representation with implicit neural embeddings, is a simple-yet-effective representation for large-scale sequential mesh modeling. After that, we present MeshXL, a family of generative pre-trained auto-regressive models, which addresses the process of 3D mesh generation with modern large language model approaches. Extensive experiments show that MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

TL;DR

MeshXL tackles direct generation of high-fidelity 3D meshes by representing meshes as auto-regressive sequences through NeurCF, an explicit coordinate field with implicit embeddings. Using decoder-only transformers trained on next-coordinate prediction, MeshXL scales across 125M–1.3B parameters and leverages a large 2.5M-mesh corpus with about billion tokens, including image/text conditioning via a fixed -token prefix. Empirical results on ShapeNet and Objaverse show MeshXL surpasses prior methods in diversity and quality, with human studies favoring MeshXL outputs and successful -to-Mesh and texture generation capabilities. Together, NeurCF and MeshXL establish a scalable 3D foundation-model framework for conditional mesh generation and downstream content creation, while acknowledging inference-time limitations and directions for speedups.

Abstract

The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation process can be seamlessly treated as an auto-regressive problem. In this paper, we validate the Neural Coordinate Field (NeurCF), an explicit coordinate representation with implicit neural embeddings, is a simple-yet-effective representation for large-scale sequential mesh modeling. After that, we present MeshXL, a family of generative pre-trained auto-regressive models, which addresses the process of 3D mesh generation with modern large language model approaches. Extensive experiments show that MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.
Paper Structure (46 sections, 2 equations, 9 figures, 5 tables)

This paper contains 46 sections, 2 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: MeshXL can auto-regressively generate high-quality 3D meshes. We validate that Neural Coordinate Field (NeurCF), an explicit coordinate representation with implicit neural embeddings, is a simple-yet-effective sequence representation for large-scale mesh modelling.
  • Figure 2: Mesh Representation. We present the Neural Coordinate Field (NeurCF) to encode the discretized coordinates in the Euclidean space. Benefiting from NeurCF and a pre-defined ordering strategy, our proposed MeshXL can directly generate the unstructured 3D mesh auto-regressively.
  • Figure 3: Training and Validation Perplexity (PPL) for MeshXL Models. We train all the models from scratch on 150 billion tokens. We observe that the performance grows with model sizes.
  • Figure 4: Evaluation of Partial Mesh Completion. Given some partial observation of the 3D mesh (white), MeshXL is able to produce diverse object completion results (blue).
  • Figure 5: Evaluation of $\mathcal{X}$-to-mesh generation. We show that MeshXL can generate high-quality 3D meshes given the corresponding image or text as the additional inputs.
  • ...and 4 more figures