Table of Contents
Fetching ...

SKED: Sketch-guided Text-based 3D Editing

Aryan Mikaeili, Or Perel, Mehdi Safaee, Daniel Cohen-Or, Ali Mahdavi-Amiri

TL;DR

SKED tackles the challenge of editing 3D shapes represented by Neural Radiance Fields (NeRFs) using minimal user input: two multiview sketches and a text prompt. It couples a diffusion-guided SDS objective with two novel losses, $ ext{L}_{pres}$ and $ ext{L}_{sil}$, to constrain edits to sketch regions while preserving the base geometry and radiance, enabling localized, semantically consistent modifications. The method operates in a zero-shot setting, using an editable NeRF $F_e$ initialized from a base $F_o$, optimized over $10{,}000$ iterations with an occupancy-grid strategy on $RTX ext{ 3090}$ hardware, and demonstrates both qualitative and quantitative gains (IoS, CLIP similarity, PSNR) over text-only baselines. This work advances interactive 3D editing by leveraging sparse user sketches in combination with diffusion priors, offering practical, sketch-guided 3D content creation with potential extensions to richer materials and animation.

Abstract

Text-to-image diffusion models are gradually introduced into computer graphics, recently enabling the development of Text-to-3D pipelines in an open domain. However, for interactive editing purposes, local manipulations of content through a simplistic textual interface can be arduous. Incorporating user guided sketches with Text-to-image pipelines offers users more intuitive control. Still, as state-of-the-art Text-to-3D pipelines rely on optimizing Neural Radiance Fields (NeRF) through gradients from arbitrary rendering views, conditioning on sketches is not straightforward. In this paper, we present SKED, a technique for editing 3D shapes represented by NeRFs. Our technique utilizes as few as two guiding sketches from different views to alter an existing neural field. The edited region respects the prompt semantics through a pre-trained diffusion model. To ensure the generated output adheres to the provided sketches, we propose novel loss functions to generate the desired edits while preserving the density and radiance of the base instance. We demonstrate the effectiveness of our proposed method through several qualitative and quantitative experiments. https://sked-paper.github.io/

SKED: Sketch-guided Text-based 3D Editing

TL;DR

SKED tackles the challenge of editing 3D shapes represented by Neural Radiance Fields (NeRFs) using minimal user input: two multiview sketches and a text prompt. It couples a diffusion-guided SDS objective with two novel losses, and , to constrain edits to sketch regions while preserving the base geometry and radiance, enabling localized, semantically consistent modifications. The method operates in a zero-shot setting, using an editable NeRF initialized from a base , optimized over iterations with an occupancy-grid strategy on hardware, and demonstrates both qualitative and quantitative gains (IoS, CLIP similarity, PSNR) over text-only baselines. This work advances interactive 3D editing by leveraging sparse user sketches in combination with diffusion priors, offering practical, sketch-guided 3D content creation with potential extensions to richer materials and animation.

Abstract

Text-to-image diffusion models are gradually introduced into computer graphics, recently enabling the development of Text-to-3D pipelines in an open domain. However, for interactive editing purposes, local manipulations of content through a simplistic textual interface can be arduous. Incorporating user guided sketches with Text-to-image pipelines offers users more intuitive control. Still, as state-of-the-art Text-to-3D pipelines rely on optimizing Neural Radiance Fields (NeRF) through gradients from arbitrary rendering views, conditioning on sketches is not straightforward. In this paper, we present SKED, a technique for editing 3D shapes represented by NeRFs. Our technique utilizes as few as two guiding sketches from different views to alter an existing neural field. The edited region respects the prompt semantics through a pre-trained diffusion model. To ensure the generated output adheres to the provided sketches, we propose novel loss functions to generate the desired edits while preserving the density and radiance of the base instance. We demonstrate the effectiveness of our proposed method through several qualitative and quantitative experiments. https://sked-paper.github.io/
Paper Structure (28 sections, 8 equations, 15 figures, 7 tables)

This paper contains 28 sections, 8 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: Examples of our Sketch-guided, Text-based 3D editing method. Taking a pretrained Neural Radiance Field as input, multiview sketches determining the coarse region of edit and a text-prompt, our method is able to generate a localized, meaningful edit.
  • Figure 2: An overview of SKED. We render the base NeRF model $F_o$ from at least two views and sketch over them ($C_i$). The input to the editing algorithm is these sketches preprocessed to masks ($M_i$) and a text prompt. In each iteration similar to DreamFusion poole2022dreamfusion, we render a random view and apply the Score Distillation Loss to semantically align with the text prompt. Additionally, we compute $\mathcal{L}_{pres}$ to preserve the base NeRF by constraining $F_e$'s density and color output to be similar to $F_o$ away from the sketch regions. Finally, we use the object mask renderings of the sketch views to define $\mathcal{L}_{sil}$. This loss ensures that the object mask occupies the sketch regions.
  • Figure 3: 3D points $\mathbf{p}_i$ sampled at random views are projected to the sketch views $C_j$ to obtain $\Pi(\mathbf{p}_i, C_j)$. In each $C_j$, distance $d_j$, between projected points and the pixels containing the sketch masks is computed. The red color in $C_1$ and $C_2$ demonstrates $d_1(\textbf{p})$ and $d_2(\textbf{p})$ in image space respectively. Finally, for each 3D point, $d_j(\mathbf{p}_i)$s are averaged to get the distance $D(\mathbf{p}_i)$ to all sketch views. $D(\mathbf{p}_i)$ is used as the weights of the points $w_i$ in $\mathcal{L}_{pres}$.
  • Figure 4: Examples of using SKED to edit various objects reconstructed with InstantNGP mueller2022instant (anime girl) or generated with DreamFusion poole2022dreamfusion (plant, sundae, teddy bear, sundae, cupcake). All examples were edited using two sketch views and the text prompt.
  • Figure 5: Various types of edits. SKED is capable of overwriting parts the base model (Cactus, Skirt), as well as adding new details (Pancake, Teddy).
  • ...and 10 more figures