Table of Contents
Fetching ...

DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models

Zhengyang Yu, Zhaoyuan Yang, Jing Zhang

TL;DR

This work enhances the source image conditioned editability of a personalized diffusion model via a novel Editability Driven Score Distillation (EDSD) objective and employs two key modifications to the Delta Denoising Score framework that enable high-fidelity local editing with personalized concepts.

Abstract

Recent text-to-image personalization methods have shown great promise in teaching a diffusion model user-specified concepts given a few images for reusing the acquired concepts in a novel context. With massive efforts being dedicated to personalized generation, a promising extension is personalized editing, namely to edit an image using personalized concepts, which can provide a more precise guidance signal than traditional textual guidance. To address this, a straightforward solution is to incorporate a personalized diffusion model with a text-driven editing framework. However, such a solution often shows unsatisfactory editability on the source image. To address this, we propose DreamSteerer, a plug-in method for augmenting existing T2I personalization methods. Specifically, we enhance the source image conditioned editability of a personalized diffusion model via a novel Editability Driven Score Distillation (EDSD) objective. Moreover, we identify a mode trapping issue with EDSD, and propose a mode shifting regularization with spatial feature guided sampling to avoid such an issue. We further employ two key modifications to the Delta Denoising Score framework that enable high-fidelity local editing with personalized concepts. Extensive experiments validate that DreamSteerer can significantly improve the editability of several T2I personalization baselines while being computationally efficient.

DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models

TL;DR

This work enhances the source image conditioned editability of a personalized diffusion model via a novel Editability Driven Score Distillation (EDSD) objective and employs two key modifications to the Delta Denoising Score framework that enable high-fidelity local editing with personalized concepts.

Abstract

Recent text-to-image personalization methods have shown great promise in teaching a diffusion model user-specified concepts given a few images for reusing the acquired concepts in a novel context. With massive efforts being dedicated to personalized generation, a promising extension is personalized editing, namely to edit an image using personalized concepts, which can provide a more precise guidance signal than traditional textual guidance. To address this, a straightforward solution is to incorporate a personalized diffusion model with a text-driven editing framework. However, such a solution often shows unsatisfactory editability on the source image. To address this, we propose DreamSteerer, a plug-in method for augmenting existing T2I personalization methods. Specifically, we enhance the source image conditioned editability of a personalized diffusion model via a novel Editability Driven Score Distillation (EDSD) objective. Moreover, we identify a mode trapping issue with EDSD, and propose a mode shifting regularization with spatial feature guided sampling to avoid such an issue. We further employ two key modifications to the Delta Denoising Score framework that enable high-fidelity local editing with personalized concepts. Extensive experiments validate that DreamSteerer can significantly improve the editability of several T2I personalization baselines while being computationally efficient.

Paper Structure

This paper contains 52 sections, 15 equations, 24 figures, 5 tables.

Figures (24)

  • Figure 1: DreamSteerer enables efficient editability enhancement for a source image with any existing T2I personalization models, leading to significantly improved editing fidelity in various challenging scenarios. When the structural difference between source and reference images are significant, it can naturally adapt to the source while maintaining the appearance learned from the personal concept.
  • Figure 2: Source class bias of DreamBooth trained for "plushie_tortoise".
  • Figure 3: Overall framework of DreamSteerer (the gradient flows are illustrated with dashed lines).
  • Figure 4: The effect of different regularization strategies on the editing and generation results of a DreamBooth baseline. The source prompt is "a photo of a cat sitting next to a mirror".
  • Figure 5: Illustration on the effect of the proposed components on editing with a DreamBooth baseline (1st row shows the editing results; 2nd row shows the editing directions, where brown means zero).
  • ...and 19 more figures