Table of Contents
Fetching ...

Preserving Identity with Variational Score for General-purpose 3D Editing

Duong H. Le, Tuan Pham, Aniruddha Kembhavi, Stephan Mandt, Wei-Chiu Ma, Jiasen Lu

TL;DR

Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models, inspired by the recently proposed method for 2D image editing - Delta Denoising Score (DDS).

Abstract

We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models. Specifically, our approach is inspired by the recently proposed method for 2D image editing - Delta Denoising Score (DDS). We pinpoint the limitations in DDS for 2D and 3D editing, which causes detail loss and over-saturation. To address this, we propose an additional score distillation term that enforces identity preservation. This results in a more stable editing process, gradually optimizing NeRF models to match target prompts while retaining crucial input characteristics. We demonstrate the effectiveness of our approach in zero-shot image and neural field editing. Our method successfully alters visual attributes, adds both subtle and substantial structural elements, translates shapes, and achieves competitive results on standard 2D and 3D editing benchmarks. Additionally, our method imposes no constraints like masking or pre-training, making it compatible with a wide range of pre-trained diffusion models. This allows for versatile editing without needing neural field-to-mesh conversion, offering a more user-friendly experience.

Preserving Identity with Variational Score for General-purpose 3D Editing

TL;DR

Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models, inspired by the recently proposed method for 2D image editing - Delta Denoising Score (DDS).

Abstract

We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models. Specifically, our approach is inspired by the recently proposed method for 2D image editing - Delta Denoising Score (DDS). We pinpoint the limitations in DDS for 2D and 3D editing, which causes detail loss and over-saturation. To address this, we propose an additional score distillation term that enforces identity preservation. This results in a more stable editing process, gradually optimizing NeRF models to match target prompts while retaining crucial input characteristics. We demonstrate the effectiveness of our approach in zero-shot image and neural field editing. Our method successfully alters visual attributes, adds both subtle and substantial structural elements, translates shapes, and achieves competitive results on standard 2D and 3D editing benchmarks. Additionally, our method imposes no constraints like masking or pre-training, making it compatible with a wide range of pre-trained diffusion models. This allows for versatile editing without needing neural field-to-mesh conversion, offering a more user-friendly experience.
Paper Structure (35 sections, 8 equations, 14 figures, 5 tables, 1 algorithm)

This paper contains 35 sections, 8 equations, 14 figures, 5 tables, 1 algorithm.

Figures (14)

  • Figure 1: Illustrating the content drifting problem of DDS hertz2023delta and our method on image editing. While DDS can avoid blurry results and maintain background at the first few training steps, it deviates the input significantly after a few hundred steps and returns over-saturated and irrelevant results.
  • Figure 2: Illustrating the editing pipeline of our approach. Given a NeRF model, a target prompt $\mathbf{c}_{tgt}$ describes the desired editing results and source prompt $\mathbf{c}_{src}$ that specify the original NeRF model. We initialize the edited model $\theta_{edited}$ from the input NeRF and update it according to Equation \ref{['eq:dds_7']}. At the same time, we fine-tune the source and target diffusion model $\mathbf{D}_\phi$ and $\mathbf{D}_\psi$ based on diffusion loss. Thus, the two models can approximate the score of distribution of sampled images from original/edited NeRF.
  • Figure 3: Results of 2D editing with our method and baselines. We use the results from DirectInversion ju2023direct for every method.
  • Figure 4: Piva results on the Noe benchmark.
  • Figure 5: Real-world NeRF editing over the face and person-small scenes from the IN2N dataset. The leftmost images are the original images, followed by IN2N and our method Piva.
  • ...and 9 more figures