Table of Contents
Fetching ...

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Bingquan Dai, Li Ray Luo, Qihong Tang, Jie Wang, Xinyu Lian, Hao Xu, Minghan Qin, Xudong Xu, Bo Dai, Haoqian Wang, Zhaoyang Lyu, Jiangmiao Pang

TL;DR

MeshCoder tackles the challenge of turning 3D point clouds into editable, part-aware Blender Python scripts by designing an expressive API suite and a large-scale part-to-code dataset. It trains a multimodal, fine-tuned LLM to translate point clouds into executable scripts, then assembles part codes into complete object programs, enabling intuitive geometric and topological editing. The approach achieves superior reconstruction on 41 categories compared to strong baselines and demonstrates enhanced shape understanding when interacting with large language models, while enabling code-driven editing workflows. This work advances programmable 3D shape reconstruction and understanding, with practical implications for reverse engineering, asset creation, and design iteration.

Abstract

Reconstructing 3D objects into editable programs is pivotal for applications like reverse engineering and shape editing. However, existing methods often rely on limited domain-specific languages (DSLs) and small-scale datasets, restricting their ability to model complex geometries and structures. To address these challenges, we introduce MeshCoder, a novel framework that reconstructs complex 3D objects from point clouds into editable Blender Python scripts. We develop a comprehensive set of expressive Blender Python APIs capable of synthesizing intricate geometries. Leveraging these APIs, we construct a large-scale paired object-code dataset, where the code for each object is decomposed into distinct semantic parts. Subsequently, we train a multimodal large language model (LLM) that translates 3D point cloud into executable Blender Python scripts. Our approach not only achieves superior performance in shape-to-code reconstruction tasks but also facilitates intuitive geometric and topological editing through convenient code modifications. Furthermore, our code-based representation enhances the reasoning capabilities of LLMs in 3D shape understanding tasks. Together, these contributions establish MeshCoder as a powerful and flexible solution for programmatic 3D shape reconstruction and understanding. The project homepage is available at \href{https://daibingquan.github.io/MeshCoder}{this link}.

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

TL;DR

MeshCoder tackles the challenge of turning 3D point clouds into editable, part-aware Blender Python scripts by designing an expressive API suite and a large-scale part-to-code dataset. It trains a multimodal, fine-tuned LLM to translate point clouds into executable scripts, then assembles part codes into complete object programs, enabling intuitive geometric and topological editing. The approach achieves superior reconstruction on 41 categories compared to strong baselines and demonstrates enhanced shape understanding when interacting with large language models, while enabling code-driven editing workflows. This work advances programmable 3D shape reconstruction and understanding, with practical implications for reverse engineering, asset creation, and design iteration.

Abstract

Reconstructing 3D objects into editable programs is pivotal for applications like reverse engineering and shape editing. However, existing methods often rely on limited domain-specific languages (DSLs) and small-scale datasets, restricting their ability to model complex geometries and structures. To address these challenges, we introduce MeshCoder, a novel framework that reconstructs complex 3D objects from point clouds into editable Blender Python scripts. We develop a comprehensive set of expressive Blender Python APIs capable of synthesizing intricate geometries. Leveraging these APIs, we construct a large-scale paired object-code dataset, where the code for each object is decomposed into distinct semantic parts. Subsequently, we train a multimodal large language model (LLM) that translates 3D point cloud into executable Blender Python scripts. Our approach not only achieves superior performance in shape-to-code reconstruction tasks but also facilitates intuitive geometric and topological editing through convenient code modifications. Furthermore, our code-based representation enhances the reasoning capabilities of LLMs in 3D shape understanding tasks. Together, these contributions establish MeshCoder as a powerful and flexible solution for programmatic 3D shape reconstruction and understanding. The project homepage is available at \href{https://daibingquan.github.io/MeshCoder}{this link}.

Paper Structure

This paper contains 29 sections, 3 equations, 29 figures, 4 tables.

Figures (29)

  • Figure 1:
  • Figure 2:
  • Figure 4: Overview of MeshCoder. The input point cloud is first encoded into shape tokens via a shape tokenizer. These tokens are then fed into a large language model (LLM), which autoregressively generates executable code representing part-based 3D structures. The decoded code specifies object's name, part identities and names, enabling interpretable and modular reconstruction.
  • Figure 5: Visualization of basic geometric shape types and their corresponding code. For each shape category, the code shown corresponds to the first example.
  • Figure 6: Architecture of the shape tokenizer. We first project the point cloud into the triplane and obtain triplane features. The triplane features are patchified and reshaped into a 1D sequence, and fed into transformer blocks to obtain triplane tokens. Finally, we use a set of learnable tokens to aggregate information from triplane tokens via cross-attention.
  • ...and 24 more figures