PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing

Peize Li; Zeyu Zhang; Hao Tang

PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing

Peize Li, Zeyu Zhang, Hao Tang

TL;DR

PartRAG, a retrieval-augmented framework that integrates an external part database with a diffusion transformer to couple generation with an editable representation, and adds a masked, part-level editor that operates in a shared canonical space, enabling swaps, attribute refinements, and compositional updates without regenerating the whole object while preserving non-target parts and multi-view consistency.

Abstract

Single-image 3D generation with part-level structure remains challenging: learned priors struggle to cover the long tail of part geometries and maintain multi-view consistency, and existing systems provide limited support for precise, localized edits. We present PartRAG, a retrieval-augmented framework that integrates an external part database with a diffusion transformer to couple generation with an editable representation. To overcome the first challenge, we introduce a Hierarchical Contrastive Retrieval module that aligns dense image patches with 3D part latents at both part and object granularity, retrieving from a curated bank of 1,236 part-annotated assets to inject diverse, physically plausible exemplars into denoising. To overcome the second challenge, we add a masked, part-level editor that operates in a shared canonical space, enabling swaps, attribute refinements, and compositional updates without regenerating the whole object while preserving non-target parts and multi-view consistency. PartRAG achieves competitive results on Objaverse, ShapeNet, and ABO-reducing Chamfer Distance from 0.1726 to 0.1528 and raising F-Score from 0.7472 to 0.844 on Objaverse-with inference of 38s and interactive edits in 5-8s. Qualitatively, PartRAG produces sharper part boundaries, better thin-structure fidelity, and robust behavior on articulated objects. Code: https://github.com/AIGeeksGroup/PartRAG. Website: https://aigeeksgroup.github.io/PartRAG.

PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing

TL;DR

Abstract

Paper Structure (30 sections, 4 equations, 4 figures, 6 tables, 3 algorithms)

This paper contains 30 sections, 4 equations, 4 figures, 6 tables, 3 algorithms.

Introduction
Related Work
Part-Aware 3D Generation
RAG for Structured Synthesis
Fine-Grained Cross-Modal Representation Learning
The Proposed Method
Overview
Hierarchical Contrastive Retrieval
Encoders.
Loss.
Retrieval Cross-Attention
Training and Inference
Part-Level Editing
Experiments
Dataset and Benchmarks
...and 15 more sections

Figures (4)

Figure 1: PartRAG at a glance. Each row shows three stages: left column---textured input image; middle column---generated part-structured 3D meshes with crisp boundaries (shown as gray models); right column---decomposed individual parts, enabling localized, view-consistent editing. Our retrieval-augmented framework reconstructs part-structured 3D assets across diverse categories and maintains parts in a shared canonical space for interactive manipulation.
Figure 2: PartRAG architecture.Top: Retrieval-augmented pipeline generating structured part meshes. Bottom: PartCrafter DiTModel Transformer, Retrieval Module, and Contrastive Learning.
Figure 3: Qualitative comparison. Each row shows the input photograph (left), the HoloPart baseline and PartCrafter(middle), and our PartRAG result (right). Different colors indicate different parts of the object. Our method preserves crisper part boundaries and cleaner normals across diverse categories.
Figure 4: Part-level editing across six examples. Our method enables localized, structure-aware edits that preserve non-target parts, honor learned attachment transforms $T_i$, coordinate multi-part changes, and maintain multi-view consistency, achieved in 5--8 s per edit.

PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing

TL;DR

Abstract

PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (4)