Table of Contents
Fetching ...

Mesh RAG: Retrieval Augmentation for Autoregressive Mesh Generation

Xiatao Sun, Chen Liang, Qian Wang, Daniel Rakita

TL;DR

This work tackles the bottleneck of quality-speed trade-offs and limited editability in autoregressive 3D mesh generation. It introduces Mesh RAG, a training-free, plug-and-play framework that segments a point-cloud prompt, generates parts in parallel, and uses a two-stage transformation retrieval (coarse AABB alignment followed by ICP refinement) to place parts coherently, enabling incremental editing without retraining. Across multiple autoregressive baselines, Mesh RAG yields substantial gains in geometric fidelity and, for larger models, faster inference, while enabling precise, localized edits. The approach generalizes to multi-modal prompts via an intermediate SLAT representation, with open-source implementation to spur further retrieval-augmented mesh research.

Abstract

3D meshes are a critical building block for applications ranging from industrial design and gaming to simulation and robotics. Traditionally, meshes are crafted manually by artists, a process that is time-intensive and difficult to scale. To automate and accelerate this asset creation, autoregressive models have emerged as a powerful paradigm for artistic mesh generation. However, current methods to enhance quality typically rely on larger models or longer sequences that result in longer generation time, and their inherent sequential nature imposes a severe quality-speed trade-off. This sequential dependency also significantly complicates incremental editing. To overcome these limitations, we propose Mesh RAG, a novel, training-free, plug-and-play framework for autoregressive mesh generation models. Inspired by RAG for language models, our approach augments the generation process by leveraging point cloud segmentation, spatial transformation, and point cloud registration to retrieve, generate, and integrate mesh components. This retrieval-based approach decouples generation from its strict sequential dependency, facilitating efficient and parallelizable inference. We demonstrate the wide applicability of Mesh RAG across various foundational autoregressive mesh generation models, showing it significantly enhances mesh quality, accelerates generation speed compared to sequential part prediction, and enables incremental editing, all without model retraining.

Mesh RAG: Retrieval Augmentation for Autoregressive Mesh Generation

TL;DR

This work tackles the bottleneck of quality-speed trade-offs and limited editability in autoregressive 3D mesh generation. It introduces Mesh RAG, a training-free, plug-and-play framework that segments a point-cloud prompt, generates parts in parallel, and uses a two-stage transformation retrieval (coarse AABB alignment followed by ICP refinement) to place parts coherently, enabling incremental editing without retraining. Across multiple autoregressive baselines, Mesh RAG yields substantial gains in geometric fidelity and, for larger models, faster inference, while enabling precise, localized edits. The approach generalizes to multi-modal prompts via an intermediate SLAT representation, with open-source implementation to spur further retrieval-augmented mesh research.

Abstract

3D meshes are a critical building block for applications ranging from industrial design and gaming to simulation and robotics. Traditionally, meshes are crafted manually by artists, a process that is time-intensive and difficult to scale. To automate and accelerate this asset creation, autoregressive models have emerged as a powerful paradigm for artistic mesh generation. However, current methods to enhance quality typically rely on larger models or longer sequences that result in longer generation time, and their inherent sequential nature imposes a severe quality-speed trade-off. This sequential dependency also significantly complicates incremental editing. To overcome these limitations, we propose Mesh RAG, a novel, training-free, plug-and-play framework for autoregressive mesh generation models. Inspired by RAG for language models, our approach augments the generation process by leveraging point cloud segmentation, spatial transformation, and point cloud registration to retrieve, generate, and integrate mesh components. This retrieval-based approach decouples generation from its strict sequential dependency, facilitating efficient and parallelizable inference. We demonstrate the wide applicability of Mesh RAG across various foundational autoregressive mesh generation models, showing it significantly enhances mesh quality, accelerates generation speed compared to sequential part prediction, and enables incremental editing, all without model retraining.

Paper Structure

This paper contains 21 sections, 3 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Parallel generation and incremental editing with Mesh RAG. Our framework enhances autoregressive models by enabling parallel generation (Top) to improve quality and reduce inference time. It also supports efficient incremental editing (Bottom), allowing for localized generation of newly added or edited segments.
  • Figure 2: Workflows for core components of Mesh RAG, parallel generation, and editing. Mesh RAG includes a point cloud segmentation module that leverages Sonata wu2025sonata and P3-SAM ma2025p3 for part segmentation on point clouds. It also includes a transformation retrieval module that first performs an initial coarse alignment by leveraging the known transformation for the point cloud prompt, and then using iterative closest point to refine the alignments.
  • Figure 3: Qualitative comparison of autoregressive mesh generation models with and without Mesh RAG. Our plug-and-play framework significantly boosts the generation quality of baseline models (MeshAnything, MeshAnything V2, and BPT). By processing each segment with a dedicated context, Mesh RAG produces more complete geometries and preserves finer details. Additional qualtitative results for TreeMeshGPT and DeepMesh are in the supplementary material §\ref{['sec:supp_additional_qualtitative_results']}.
  • Figure 4: Generation time (minutes) as a function of batch size. Each line corresponds to a different model augmented with our Mesh RAG framework.
  • Figure 5: Qualitative comparison of incremental editing. The Ground Truth visualizes the task: adding the red segments to the initial grey mesh. All generated outputs are rendered in a uniform cyan for fair comparison.
  • ...and 3 more figures