Table of Contents
Fetching ...

ProcGen3D: Learning Neural Procedural Graph Representations for Image-to-3D Reconstruction

Xinyi Zhang, Daoyi Gao, Naiqi Li, Angela Dai

TL;DR

ProcGen3D tackles image-to-3D reconstruction by learning compact procedural graph representations of objects that can be decoded by a procedural generator into rich meshes. It uses a GPT-style transformer to model edge-based tokens of a procedural graph, conditioned on input images, and introduces Monte Carlo Tree Search at test time to guide generation toward image-faithful reconstructions. The approach achieves state-of-the-art or competitive results across cactus, tree, and bridge categories, outperforming diffusion- and diffusion-based baselines while offering strong generalization to real-world images despite training on synthetic data. By combining a structured, interpretable graph representation with image-guided search, ProcGen3D enables high-fidelity, production-ready 3D assets with improved geometric sharpness and local detail.

Abstract

We introduce ProcGen3D, a new approach for 3D content creation by generating procedural graph abstractions of 3D objects, which can then be decoded into rich, complex 3D assets. Inspired by the prevalent use of procedural generators in production 3D applications, we propose a sequentialized, graph-based procedural graph representation for 3D assets. We use this to learn to approximate the landscape of a procedural generator for image-based 3D reconstruction. We employ edge-based tokenization to encode the procedural graphs, and train a transformer prior to predict the next token conditioned on an input RGB image. Crucially, to enable better alignment of our generated outputs to an input image, we incorporate Monte Carlo Tree Search (MCTS) guided sampling into our generation process, steering output procedural graphs towards more image-faithful reconstructions. Our approach is applicable across a variety of objects that can be synthesized with procedural generators. Extensive experiments on cacti, trees, and bridges show that our neural procedural graph generation outperforms both state-of-the-art generative 3D methods and domain-specific modeling techniques. Furthermore, this enables improved generalization on real-world input images, despite training only on synthetic data.

ProcGen3D: Learning Neural Procedural Graph Representations for Image-to-3D Reconstruction

TL;DR

ProcGen3D tackles image-to-3D reconstruction by learning compact procedural graph representations of objects that can be decoded by a procedural generator into rich meshes. It uses a GPT-style transformer to model edge-based tokens of a procedural graph, conditioned on input images, and introduces Monte Carlo Tree Search at test time to guide generation toward image-faithful reconstructions. The approach achieves state-of-the-art or competitive results across cactus, tree, and bridge categories, outperforming diffusion- and diffusion-based baselines while offering strong generalization to real-world images despite training on synthetic data. By combining a structured, interpretable graph representation with image-guided search, ProcGen3D enables high-fidelity, production-ready 3D assets with improved geometric sharpness and local detail.

Abstract

We introduce ProcGen3D, a new approach for 3D content creation by generating procedural graph abstractions of 3D objects, which can then be decoded into rich, complex 3D assets. Inspired by the prevalent use of procedural generators in production 3D applications, we propose a sequentialized, graph-based procedural graph representation for 3D assets. We use this to learn to approximate the landscape of a procedural generator for image-based 3D reconstruction. We employ edge-based tokenization to encode the procedural graphs, and train a transformer prior to predict the next token conditioned on an input RGB image. Crucially, to enable better alignment of our generated outputs to an input image, we incorporate Monte Carlo Tree Search (MCTS) guided sampling into our generation process, steering output procedural graphs towards more image-faithful reconstructions. Our approach is applicable across a variety of objects that can be synthesized with procedural generators. Extensive experiments on cacti, trees, and bridges show that our neural procedural graph generation outperforms both state-of-the-art generative 3D methods and domain-specific modeling techniques. Furthermore, this enables improved generalization on real-world input images, despite training only on synthetic data.

Paper Structure

This paper contains 29 sections, 4 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Left: procedural generators employ rule-based generation with stochastic sampling to produce abstract graph representations that are decoded into high-fidelity 3D assets through geometry and material assignment. Right: We propose to leverage such procedural graph representations, modeling their distribution with a transformer to enable high-fidelity image-to-3D reconstruction for various categories of procedurally generated objects (cacti, trees, bridges).
  • Figure 2: Overview of our graph-based transformer. A procedural graph is tokenized into a sequence of edge-based tokens, where each token encodes the positions and attributes of its two endpoint vertices as well as the attributes of the edge itself. The transformer autoregressively predicts tokens conditioned on image features, enabling reconstruction of the underlying procedural graph.
  • Figure 3: Overview of our MCTS-guided search to reconstruct a procedural graph that well-aligns to the input image condition.
  • Figure 4: Qualitative comparison with state-of-the-art 3D generative models TRELLIS and Wonder3D. ProcGen3D achieves higher-fidelity reconstructions on all categories.
  • Figure 5: Comparison with baselines on real-world cactus, leafy tree, and bridge images. The results demonstrate that our model generalizes effectively to real-world data, despite being trained only on synthetic data.
  • ...and 4 more figures