Table of Contents
Fetching ...

PrEditor3D: Fast and Precise 3D Shape Editing

Ziya Erkoç, Can Gümeli, Chaoyang Wang, Matthias Nießner, Angela Dai, Peter Wonka, Hsin-Ying Lee, Peiye Zhuang

TL;DR

PrEditor3D tackles the challenge of fast, precise 3D editing without retraining diffusion models. It couples synchronized multi-view 2D editing with a 3D lifting and merging mechanism to confine edits to user-specified regions while preserving the rest of the shape. The method uses four orthogonal views and a color-coded 3D segmentation via GTR to identify edited regions and apply a robust averaging-based merge, enabling high fidelity and smooth boundaries. Quantitative results across GPTEval3D and directional CLIP metrics, plus user studies, show substantial improvements over state-of-the-art baselines in both visual quality and region preservation, with runtimes suitable for iterative artistic workflows.

Abstract

We propose a training-free approach to 3D editing that enables the editing of a single shape within a few minutes. The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered. To this end, we first project the 3D object onto 4-view images and perform synchronized multi-view image editing along with user-guided text prompts and user-provided rough masks. However, the targeted regions to be edited are ambiguous due to projection from 3D to 2D. To ensure precise editing only in intended regions, we develop a 3D segmentation pipeline that detects edited areas in 3D space, followed by a merging algorithm to seamlessly integrate edited 3D regions with the original input. Extensive experiments demonstrate the superiority of our method over previous approaches, enabling fast, high-quality editing while preserving unintended regions.

PrEditor3D: Fast and Precise 3D Shape Editing

TL;DR

PrEditor3D tackles the challenge of fast, precise 3D editing without retraining diffusion models. It couples synchronized multi-view 2D editing with a 3D lifting and merging mechanism to confine edits to user-specified regions while preserving the rest of the shape. The method uses four orthogonal views and a color-coded 3D segmentation via GTR to identify edited regions and apply a robust averaging-based merge, enabling high fidelity and smooth boundaries. Quantitative results across GPTEval3D and directional CLIP metrics, plus user studies, show substantial improvements over state-of-the-art baselines in both visual quality and region preservation, with runtimes suitable for iterative artistic workflows.

Abstract

We propose a training-free approach to 3D editing that enables the editing of a single shape within a few minutes. The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered. To this end, we first project the 3D object onto 4-view images and perform synchronized multi-view image editing along with user-guided text prompts and user-provided rough masks. However, the targeted regions to be edited are ambiguous due to projection from 3D to 2D. To ensure precise editing only in intended regions, we develop a 3D segmentation pipeline that detects edited areas in 3D space, followed by a merging algorithm to seamlessly integrate edited 3D regions with the original input. Extensive experiments demonstrate the superiority of our method over previous approaches, enabling fast, high-quality editing while preserving unintended regions.

Paper Structure

This paper contains 16 sections, 9 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: PrEditor3D is a (top) fast and high-quality editing method that can perform precise and consistent editing only in the intended regions, keeping the rest identical. (mid) It can handle diverse editing prompts with any given 3D object. (bottom) Furthermore, it can support iterative editing, facilitating artistic workflow, and can also support editing multiple regions in a single run.
  • Figure 2: Ambiguous intended regions. The intended region to be edited is clear in 3D (e.g. the cat tail). However, after projecting to 2D, regardless of the granularity of the user-provided masks, the editing will either alter some unintended regions (e.g. the robot cat) or be too limited for reasonable editing.
  • Figure 3: Overview of PrEditor3D. Given an input 3D object, we first render its multi-view images from 4 orthogonal views. We then obtain editing input from the user, describing in text as well as rough 2D masks the desired edits. We perform synchronized multi-view editing based on the text prompts as well as the user-provided masks (Sec.\ref{['sec:3.1']}. Due to the rough masks and the unclear intended regions caused by ambiguous 3D-2D projection, we detect the intended regions with Grounding Dino and SAM 2 (Sec.\ref{['sec:3.2']}, where the segmentation results are lifted to 3D for the final merging operation (Sec.\ref{['sec:3.3']}).
  • Figure 4: Qualitative comparison. Our method can perform diverse editing samples and only edit the intended regions.
  • Figure 5: More editing results from PrEditor3D. Our method can perform a wide range of editing on various shapes.
  • ...and 4 more figures