Table of Contents
Fetching ...

CRAG: Can 3D Generative Models Help 3D Assembly?

Zeyu Jiang, Sihang Li, Siqi Tan, Chenyang Xu, Juexiao Zhang, Julia Galway-Witham, Xue Wang, Scott A. Williams, Radu Iovita, Chen Feng, Jing Zhang

TL;DR

This work reformulates 3D assembly as a joint problem of assembly and generation, and shows that these two processes are mutually reinforcing: assembly provides part-level structural priors for generation, while generation injects holistic shape context that resolves ambiguities in assembly.

Abstract

Most existing 3D assembly methods treat the problem as pure pose estimation, rearranging observed parts via rigid transformations. In contrast, human assembly naturally couples structural reasoning with holistic shape inference. Inspired by this intuition, we reformulate 3D assembly as a joint problem of assembly and generation. We show that these two processes are mutually reinforcing: assembly provides part-level structural priors for generation, while generation injects holistic shape context that resolves ambiguities in assembly. Unlike prior methods that cannot synthesize missing geometry, we propose CRAG, which simultaneously generates plausible complete shapes and predicts poses for input parts. Extensive experiments demonstrate state-of-the-art performance across in-the-wild objects with diverse geometries, varying part counts, and missing pieces. Our code and models will be released.

CRAG: Can 3D Generative Models Help 3D Assembly?

TL;DR

This work reformulates 3D assembly as a joint problem of assembly and generation, and shows that these two processes are mutually reinforcing: assembly provides part-level structural priors for generation, while generation injects holistic shape context that resolves ambiguities in assembly.

Abstract

Most existing 3D assembly methods treat the problem as pure pose estimation, rearranging observed parts via rigid transformations. In contrast, human assembly naturally couples structural reasoning with holistic shape inference. Inspired by this intuition, we reformulate 3D assembly as a joint problem of assembly and generation. We show that these two processes are mutually reinforcing: assembly provides part-level structural priors for generation, while generation injects holistic shape context that resolves ambiguities in assembly. Unlike prior methods that cannot synthesize missing geometry, we propose CRAG, which simultaneously generates plausible complete shapes and predicts poses for input parts. Extensive experiments demonstrate state-of-the-art performance across in-the-wild objects with diverse geometries, varying part counts, and missing pieces. Our code and models will be released.
Paper Structure (11 sections, 8 equations, 6 figures, 2 tables)

This paper contains 11 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 2: Overview of our approach Crag . We propose a unified framework for 3D assembly and whole-shape generation. Our model consists of two interacting branches: an Assembly Branch that predicts the pose for each part via SE(3) flow matching, and a Generation Branch that synthesizes the complete shape via flow matching. A Joint Adapter bridges these branches, enabling bidirectional information flow. We employ a two-stage training strategy: learning assembly first, and then jointly finetuning both tasks.
  • Figure 3: Qualitative results across PartNeXt wang2025partnext, Breaking Bad sellan2022breaking, and MorphoSource boyer2016morphosource. We first compare methods without reference images by contrasting GARF li2025garf, RPF sun2025_rpf, and Crag w/o image, where Crag produces more coherent assemblies and more complete shapes from the same observed parts. We then compare image-conditioned methods by showing Assembler and full Crag given the reference image, where Crag better aligns parts and yields shapes that more closely match the ground truth.
  • Figure 4: Qualitative results on PartNeXt wang2025partnext, Breaking Bad sellan2022breaking, and MorphoSource boyer2016morphosource with missing parts. We compare Assembler, Crag without reference images, and Crag given a reference image. Crag simultaneously assembles the observed parts and synthesizes a plausible, complete shape, and reference images further improve fidelity when available.
  • Figure 5: Qualitative results under ambiguous reference images on PartNeXt wang2025partnext. We compare image-only generation with TripoSG li2025triposg against Crag by visualizing Crag’s assembled parts and generated shapes alongside the ground truth. When the reference view is incomplete and does not reveal the full object, part-level evidence helps, to some extent, resolve ambiguity and yields a better shape.
  • Figure 6: Qualitative results of Crag on FRACTURA li2025garf, demonstrating robustness on real-world fractures. All parts are real scanned fragments; colors are rendered for visualization.
  • ...and 1 more figures