Identity-preserving Distillation Sampling by Fixed-Point Iterator
SeonHwa Kim, Jiwon Kim, Soobin Park, Donghoon Ahn, Jiwon Kang, Seungryong Kim, Kyong Hwan Jin, Eunju Cha
TL;DR
We address the blurriness and identity drift in Score Distillation Sampling (SDS) for text-guided editing by introducing Identity-preserving Distillation Sampling (IDS) with Fixed-point Iterative Regularization (FPR). IDS explicitly corrects the text-conditioned score toward the source identity by refining the posterior mean via Tweedie’s formula and using guided noise from a re-estimated source latent, enabling stable, structure-preserving edits in both 2D images and editable NeRF. Empirical results show IDS with FPR outperforms baselines (DDS, CDS, P2P, PnP) on 2D image editing metrics (LPIPS, IoU, PSNR) and CLIP-based NeRF evaluations, with ablations underscoring the importance of FPR iterations and scale. This approach offers a practical, modular regularization for diffusion-based editing, improving identity preservation while maintaining prompt fidelity, with noted limitations and avenues for extending to target-aware scoring and reduced computation.
Abstract
Score distillation sampling (SDS) demonstrates a powerful capability for text-conditioned 2D image and 3D object generation by distilling the knowledge from learned score functions. However, SDS often suffers from blurriness caused by noisy gradients. When SDS meets the image editing, such degradations can be reduced by adjusting bias shifts using reference pairs, but the de-biasing techniques are still corrupted by erroneous gradients. To this end, we introduce Identity-preserving Distillation Sampling (IDS), which compensates for the gradient leading to undesired changes in the results. Based on the analysis that these errors come from the text-conditioned scores, a new regularization technique, called fixed-point iterative regularization (FPR), is proposed to modify the score itself, driving the preservation of the identity even including poses and structures. Thanks to a self-correction by FPR, the proposed method provides clear and unambiguous representations corresponding to the given prompts in image-to-image editing and editable neural radiance field (NeRF). The structural consistency between the source and the edited data is obviously maintained compared to other state-of-the-art methods.
