Table of Contents
Fetching ...

Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories

Susung Hong, Johanna Karras, Ricardo Martin-Brualla, Ira Kemelmacher-Shlizerman

TL;DR

Perturb-and-Revise (PnR) tackles text-guided editing of NeRF-based 3D scenes by integrating parameter-space perturbation, a generative trajectory via score distillation, and an Identity-Preserving Gradient (IPG). It automatically adjusts the perturbation level with loss-landscape analysis, enabling substantial geometry and appearance changes while preserving identity, all in a training-free framework with multi-view consistency and timestep annealing. Key contributions include adaptive parameter perturbation, loss-landscape driven eta selection, and IPG refinement that balance fidelity to the source and adherence to the edit prompt, achieving state-of-the-art results on fashion and Objaverse objects and extending to real scenes. The method offers fast, flexible 3D editing suitable for animation, design, and AR/VR workflows, with limitations tied to diffusion-model biases and compositionality challenges for future work.

Abstract

Recent advancements in text-based diffusion models have accelerated progress in 3D reconstruction and text-based 3D editing. Although existing 3D editing methods excel at modifying color, texture, and style, they struggle with extensive geometric or appearance changes, thus limiting their applications. To this end, we propose Perturb-and-Revise, which makes possible a variety of NeRF editing. First, we perturb the NeRF parameters with random initializations to create a versatile initialization. The level of perturbation is determined automatically through analysis of the local loss landscape. Then, we revise the edited NeRF via generative trajectories. Combined with the generative process, we impose identity-preserving gradients to refine the edited NeRF. Extensive experiments demonstrate that Perturb-and-Revise facilitates flexible, effective, and consistent editing of color, appearance, and geometry in 3D. For 360° results, please visit our project page: https://susunghong.github.io/Perturb-and-Revise.

Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories

TL;DR

Perturb-and-Revise (PnR) tackles text-guided editing of NeRF-based 3D scenes by integrating parameter-space perturbation, a generative trajectory via score distillation, and an Identity-Preserving Gradient (IPG). It automatically adjusts the perturbation level with loss-landscape analysis, enabling substantial geometry and appearance changes while preserving identity, all in a training-free framework with multi-view consistency and timestep annealing. Key contributions include adaptive parameter perturbation, loss-landscape driven eta selection, and IPG refinement that balance fidelity to the source and adherence to the edit prompt, achieving state-of-the-art results on fashion and Objaverse objects and extending to real scenes. The method offers fast, flexible 3D editing suitable for animation, design, and AR/VR workflows, with limitations tied to diffusion-model biases and compositionality challenges for future work.

Abstract

Recent advancements in text-based diffusion models have accelerated progress in 3D reconstruction and text-based 3D editing. Although existing 3D editing methods excel at modifying color, texture, and style, they struggle with extensive geometric or appearance changes, thus limiting their applications. To this end, we propose Perturb-and-Revise, which makes possible a variety of NeRF editing. First, we perturb the NeRF parameters with random initializations to create a versatile initialization. The level of perturbation is determined automatically through analysis of the local loss landscape. Then, we revise the edited NeRF via generative trajectories. Combined with the generative process, we impose identity-preserving gradients to refine the edited NeRF. Extensive experiments demonstrate that Perturb-and-Revise facilitates flexible, effective, and consistent editing of color, appearance, and geometry in 3D. For 360° results, please visit our project page: https://susunghong.github.io/Perturb-and-Revise.

Paper Structure

This paper contains 41 sections, 16 equations, 15 figures, 4 tables, 2 algorithms.

Figures (15)

  • Figure 1: We qualitatively compare our method to Instruct-NeRF2NeRF haque2023instruct, Instruct-GS2GS igs2gs, and Posterior Distillation koo2024posterior. In this task, the goal is to edit the source "bear" into a "polar bear." Compared to related works, our approach reconstructs a more realistic face and better overall geometry. An asterisk (*) denotes that we use an identical update rule and schedule.
  • Figure 2: Conceptual figure. The target distribution in the figures represents the conditional distribution of NeRF parameters relative to the edit prompt, and $\mathcal{P}(\Theta)$ denotes the distribution of randomly initialized NeRF parameters. First, parameter perturbation enables the parameters to escape from local minima and follow a natural generative path. Subsequently, during the refining process, the tug-of-war between two vectors, $\lambda_{\text{d}}\nabla_{\theta}d(\theta_\tau, \theta_\textrm{src})$ (the red arrow) and $d\theta_\tau$ (the blue arrow), pushes the actual parameters into a region that is closer to either the source parameters or the high-density region specified by the edit prompt, following the purple arrow.
  • Figure 3: Effect of parameter perturbation. In this example, we aim to make a NeRF model of a standing person sit down using the word "sitting." The scene converges quickly even with large perturbations ($\eta = 0.6$), while complete regeneration yields blurry rendering results given the same number of optimization steps.
  • Figure 4: Baseline comparisons with a wide range of edits. We compare our method with Score Distillation Sampling (SDS) poole2022dreamfusion, Posterior Distillation Sampling (PDS) koo2024posterior, and Instruct-NeRF2NeRF haque2023instruct. For SDS and PDS, we use MVDream shi2023mvdream as the backbone for fair comparison. SDS alters the appearance and texture of the source objects and is unable to handle edits that require extensive geometric changes (3rd, 4th, and 5th rows). PDS is not capable of making significant edits and cannot deviate far from local minima due to its high preservation term from the start koo2024posterior. While Instruct-NeRF2NeRF changes the texture of objects as desired, it cannot address geometric changes. In contrast, our method is capable of various types of edits, including those involving large geometric changes.
  • Figure 5: Baseline comparisons of editing various general 3D objects from the Objaverse dataset deitke2023objaverse.
  • ...and 10 more figures