Table of Contents
Fetching ...

PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing

Peize Li, Zeyu Zhang, Hao Tang

TL;DR

PartRAG, a retrieval-augmented framework that integrates an external part database with a diffusion transformer to couple generation with an editable representation, and adds a masked, part-level editor that operates in a shared canonical space, enabling swaps, attribute refinements, and compositional updates without regenerating the whole object while preserving non-target parts and multi-view consistency.

Abstract

Single-image 3D generation with part-level structure remains challenging: learned priors struggle to cover the long tail of part geometries and maintain multi-view consistency, and existing systems provide limited support for precise, localized edits. We present PartRAG, a retrieval-augmented framework that integrates an external part database with a diffusion transformer to couple generation with an editable representation. To overcome the first challenge, we introduce a Hierarchical Contrastive Retrieval module that aligns dense image patches with 3D part latents at both part and object granularity, retrieving from a curated bank of 1,236 part-annotated assets to inject diverse, physically plausible exemplars into denoising. To overcome the second challenge, we add a masked, part-level editor that operates in a shared canonical space, enabling swaps, attribute refinements, and compositional updates without regenerating the whole object while preserving non-target parts and multi-view consistency. PartRAG achieves competitive results on Objaverse, ShapeNet, and ABO-reducing Chamfer Distance from 0.1726 to 0.1528 and raising F-Score from 0.7472 to 0.844 on Objaverse-with inference of 38s and interactive edits in 5-8s. Qualitatively, PartRAG produces sharper part boundaries, better thin-structure fidelity, and robust behavior on articulated objects. Code: https://github.com/AIGeeksGroup/PartRAG. Website: https://aigeeksgroup.github.io/PartRAG.

PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing

TL;DR

PartRAG, a retrieval-augmented framework that integrates an external part database with a diffusion transformer to couple generation with an editable representation, and adds a masked, part-level editor that operates in a shared canonical space, enabling swaps, attribute refinements, and compositional updates without regenerating the whole object while preserving non-target parts and multi-view consistency.

Abstract

Single-image 3D generation with part-level structure remains challenging: learned priors struggle to cover the long tail of part geometries and maintain multi-view consistency, and existing systems provide limited support for precise, localized edits. We present PartRAG, a retrieval-augmented framework that integrates an external part database with a diffusion transformer to couple generation with an editable representation. To overcome the first challenge, we introduce a Hierarchical Contrastive Retrieval module that aligns dense image patches with 3D part latents at both part and object granularity, retrieving from a curated bank of 1,236 part-annotated assets to inject diverse, physically plausible exemplars into denoising. To overcome the second challenge, we add a masked, part-level editor that operates in a shared canonical space, enabling swaps, attribute refinements, and compositional updates without regenerating the whole object while preserving non-target parts and multi-view consistency. PartRAG achieves competitive results on Objaverse, ShapeNet, and ABO-reducing Chamfer Distance from 0.1726 to 0.1528 and raising F-Score from 0.7472 to 0.844 on Objaverse-with inference of 38s and interactive edits in 5-8s. Qualitatively, PartRAG produces sharper part boundaries, better thin-structure fidelity, and robust behavior on articulated objects. Code: https://github.com/AIGeeksGroup/PartRAG. Website: https://aigeeksgroup.github.io/PartRAG.
Paper Structure (30 sections, 4 equations, 4 figures, 6 tables, 3 algorithms)

This paper contains 30 sections, 4 equations, 4 figures, 6 tables, 3 algorithms.

Figures (4)

  • Figure 1: PartRAG at a glance. Each row shows three stages: left column---textured input image; middle column---generated part-structured 3D meshes with crisp boundaries (shown as gray models); right column---decomposed individual parts, enabling localized, view-consistent editing. Our retrieval-augmented framework reconstructs part-structured 3D assets across diverse categories and maintains parts in a shared canonical space for interactive manipulation.
  • Figure 2: PartRAG architecture.Top: Retrieval-augmented pipeline generating structured part meshes. Bottom: PartCrafter DiTModel Transformer, Retrieval Module, and Contrastive Learning.
  • Figure 3: Qualitative comparison. Each row shows the input photograph (left), the HoloPart baseline and PartCrafter(middle), and our PartRAG result (right). Different colors indicate different parts of the object. Our method preserves crisper part boundaries and cleaner normals across diverse categories.
  • Figure 4: Part-level editing across six examples. Our method enables localized, structure-aware edits that preserve non-target parts, honor learned attachment transforms $T_i$, coordinate multi-part changes, and maintain multi-view consistency, achieved in 5--8 s per edit.