Table of Contents
Fetching ...

DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

Jiwook Kim, Seonho Lee, Jaeyo Shin, Jiho Choi, Hyunjung Shim

TL;DR

DreamCatalyst reinterprets Score Distillation Sampling (SDS) editing as aligned diffusion reverse-process sampling to overcome slow training and poor editing quality. It introduces a diffusion-aware objective with two timestep-dependent weighting functions, Phi*(t) and Psi*(t), and augments the model with FreeU to boost editability without extra cost. The method delivers two modes: a fast mode that dramatically speeds up NeRF editing (roughly 23x) and a high-quality mode that substantially improves results, achieving state-of-the-art performance on NeRF and 3D Gaussian Splatting (3DGS) editing. Across qualitative, quantitative, and user studies, DreamCatalyst demonstrates superior editability, identity preservation, and speed, establishing a versatile, model-agnostic framework for text-driven 3D editing.

Abstract

Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks, leveraging diffusion models for 3D-consistent editing. However, existing SDS-based 3D editing methods suffer from long training times and produce low-quality results. We identify that the root cause of this performance degradation is \textit{their conflict with the sampling dynamics of diffusion models}. Addressing this conflict allows us to treat SDS as a diffusion reverse process for 3D editing via sampling from data space. In contrast, existing methods naively distill the score function using diffusion models. From these insights, we propose DreamCatalyst, a novel framework that considers these sampling dynamics in the SDS framework. Specifically, we devise the optimization process of our DreamCatalyst to approximate the diffusion reverse process in editing tasks, thereby aligning with diffusion sampling dynamics. As a result, DreamCatalyst successfully reduces training time and improves editing quality. Our method offers two modes: (1) a fast mode that edits Neural Radiance Fields (NeRF) scenes approximately 23 times faster than current state-of-the-art NeRF editing methods, and (2) a high-quality mode that produces superior results about 8 times faster than these methods. Notably, our high-quality mode outperforms current state-of-the-art NeRF editing methods in terms of both speed and quality. DreamCatalyst also surpasses the state-of-the-art 3D Gaussian Splatting (3DGS) editing methods, establishing itself as an effective and model-agnostic 3D editing solution. See more extensive results on our project page: https://dream-catalyst.github.io.

DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

TL;DR

DreamCatalyst reinterprets Score Distillation Sampling (SDS) editing as aligned diffusion reverse-process sampling to overcome slow training and poor editing quality. It introduces a diffusion-aware objective with two timestep-dependent weighting functions, Phi*(t) and Psi*(t), and augments the model with FreeU to boost editability without extra cost. The method delivers two modes: a fast mode that dramatically speeds up NeRF editing (roughly 23x) and a high-quality mode that substantially improves results, achieving state-of-the-art performance on NeRF and 3D Gaussian Splatting (3DGS) editing. Across qualitative, quantitative, and user studies, DreamCatalyst demonstrates superior editability, identity preservation, and speed, establishing a versatile, model-agnostic framework for text-driven 3D editing.

Abstract

Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks, leveraging diffusion models for 3D-consistent editing. However, existing SDS-based 3D editing methods suffer from long training times and produce low-quality results. We identify that the root cause of this performance degradation is \textit{their conflict with the sampling dynamics of diffusion models}. Addressing this conflict allows us to treat SDS as a diffusion reverse process for 3D editing via sampling from data space. In contrast, existing methods naively distill the score function using diffusion models. From these insights, we propose DreamCatalyst, a novel framework that considers these sampling dynamics in the SDS framework. Specifically, we devise the optimization process of our DreamCatalyst to approximate the diffusion reverse process in editing tasks, thereby aligning with diffusion sampling dynamics. As a result, DreamCatalyst successfully reduces training time and improves editing quality. Our method offers two modes: (1) a fast mode that edits Neural Radiance Fields (NeRF) scenes approximately 23 times faster than current state-of-the-art NeRF editing methods, and (2) a high-quality mode that produces superior results about 8 times faster than these methods. Notably, our high-quality mode outperforms current state-of-the-art NeRF editing methods in terms of both speed and quality. DreamCatalyst also surpasses the state-of-the-art 3D Gaussian Splatting (3DGS) editing methods, establishing itself as an effective and model-agnostic 3D editing solution. See more extensive results on our project page: https://dream-catalyst.github.io.
Paper Structure (37 sections, 13 equations, 21 figures, 6 tables)

This paper contains 37 sections, 13 equations, 21 figures, 6 tables.

Figures (21)

  • Figure 1: Examples of 3D editing obtained by DreamCatalyst. DreamCatalyst edits 3D scenes based on the given text prompt. DreamCatalyst not only aligns with the prompt in high-quality but also effectively preserves the identity of scenes, achieving these edits at a faster rate.
  • Figure 1: Quantitative comparison on NeRF scenes. Ours outperforms the baseline methods on NeRF editing. Bold represents the best result, and underline indicates the second-best result.
  • Figure 2: Comparison of coefficients between PDS and our method across different timesteps. We plot the weighting functions of PDS and DreamCatalyst in (a) and (b), respectively. $\Phi^{\text{PDS}}$ and $\Psi^{\text{PDS}}$ indicate the coefficient of identity preservation and editability of PDS. $\Phi^{*}$ and $\Psi^{*}$ are the coefficient of identity preservation and editability of ours, respectively. $\Psi^{*}_{2}$ and $\Psi^{*}_{3}$ are for extra special cases.
  • Figure 3: Overall architecture. DreamCatalyst approximates inversion-based SDEdit with DDS loss and identity regularizer $\mathcal{R}_{\text{iden}}$. Furthermore, DreamCatalyst utilizes FreeU to enhance 3D editing quality without additional computational cost and memory usage.
  • Figure 3: User studies. We conduct user studies to measure human preference across three criteria. Our method is more preferred than other baselines. Bold indicates the best result.
  • ...and 16 more figures