Table of Contents
Fetching ...

PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance

Haohan Weng, Yikai Wang, Tong Zhang, C. L. Philip Chen, Jun Zhu

TL;DR

<3-5 sentence high-level summary> PivotMesh tackles the challenge of scalable native mesh generation by integrating a Transformer-based auto-encoder to tokenize meshes and a pivot-vertex guided auto-regressive model that first predicts coarse pivot vertices and then full mesh tokens. The method uses face-to-vertex hierarchical decoding and discrete token representations via residual vector quantization to preserve geometry and surface continuity. It achieves state-of-the-art performance on ShapeNet and Objaverse across reconstruction and generation metrics, supports conditional generation, and enables mesh variation and refinement. Limitations include controllability and scaling constraints, with potential gains from larger datasets and additional conditioning signals in future work.

Abstract

Generating compact and sharply detailed 3D meshes poses a significant challenge for current 3D generative models. Different from extracting dense meshes from neural representation, some recent works try to model the native mesh distribution (i.e., a set of triangles), which generates more compact results as humans crafted. However, due to the complexity and variety of mesh topology, these methods are typically limited to small datasets with specific categories and are hard to extend. In this paper, we introduce a generic and scalable mesh generation framework PivotMesh, which makes an initial attempt to extend the native mesh generation to large-scale datasets. We employ a transformer-based auto-encoder to encode meshes into discrete tokens and decode them from face level to vertex level hierarchically. Subsequently, to model the complex typology, we first learn to generate pivot vertices as coarse mesh representation and then generate the complete mesh tokens with the same auto-regressive Transformer. This reduces the difficulty compared with directly modeling the mesh distribution and further improves the model controllability. PivotMesh demonstrates its versatility by effectively learning from both small datasets like Shapenet, and large-scale datasets like Objaverse and Objaverse-xl. Extensive experiments indicate that PivotMesh can generate compact and sharp 3D meshes across various categories, highlighting its great potential for native mesh modeling.

PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance

TL;DR

<3-5 sentence high-level summary> PivotMesh tackles the challenge of scalable native mesh generation by integrating a Transformer-based auto-encoder to tokenize meshes and a pivot-vertex guided auto-regressive model that first predicts coarse pivot vertices and then full mesh tokens. The method uses face-to-vertex hierarchical decoding and discrete token representations via residual vector quantization to preserve geometry and surface continuity. It achieves state-of-the-art performance on ShapeNet and Objaverse across reconstruction and generation metrics, supports conditional generation, and enables mesh variation and refinement. Limitations include controllability and scaling constraints, with potential gains from larger datasets and additional conditioning signals in future work.

Abstract

Generating compact and sharply detailed 3D meshes poses a significant challenge for current 3D generative models. Different from extracting dense meshes from neural representation, some recent works try to model the native mesh distribution (i.e., a set of triangles), which generates more compact results as humans crafted. However, due to the complexity and variety of mesh topology, these methods are typically limited to small datasets with specific categories and are hard to extend. In this paper, we introduce a generic and scalable mesh generation framework PivotMesh, which makes an initial attempt to extend the native mesh generation to large-scale datasets. We employ a transformer-based auto-encoder to encode meshes into discrete tokens and decode them from face level to vertex level hierarchically. Subsequently, to model the complex typology, we first learn to generate pivot vertices as coarse mesh representation and then generate the complete mesh tokens with the same auto-regressive Transformer. This reduces the difficulty compared with directly modeling the mesh distribution and further improves the model controllability. PivotMesh demonstrates its versatility by effectively learning from both small datasets like Shapenet, and large-scale datasets like Objaverse and Objaverse-xl. Extensive experiments indicate that PivotMesh can generate compact and sharp 3D meshes across various categories, highlighting its great potential for native mesh modeling.
Paper Structure (36 sections, 4 equations, 14 figures, 3 tables)

This paper contains 36 sections, 4 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Different from 3D generation methods based on neural representations like InstantMesh xu2024instantmesh, our methods can generate compact and sharp meshes with much fewer faces when producing similar shapes.
  • Figure 2: The overall method of PivotMesh. (a) Triangle mesh sequences are tokenized into mesh tokens and hierarchically decoded from face level to vertex level via our mesh auto-encoder. (b) The auto-regressive Transformer first learns to generate pivot vertices as coarse mesh representation and then generates the complete mesh tokens in a coarse-to-fine manner.
  • Figure 3: Qualitative comparison of unconditional generation on ShapeNet. Each line represents a subset of ShapeNet (bench, chair, lamp, table).
  • Figure 4: Qualitative comparison of unconditional generation on Objaverse.
  • Figure 5: Shape novelty analysis on Objaverse dataset. We show the 3 nearest neighbors measured in Chamfer Distance (CD) for generated shapes (left). We plot the distribution of 500 generated shapes from our method and their minimum CD to the training set (right). Shapes at the 50th percentile look different from the closest train shape. It shows that our method not only covers shapes in the training set (low CD values) but also creates novel and realistic shapes (high CD values).
  • ...and 9 more figures