FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing
Yufan Ren, Zicong Jiang, Tong Zhang, Søren Forchhammer, Sabine Süsstrunk
TL;DR
This work tackles the limitations of text-guided edits in diffusion-based T2I systems, where changes often indiscriminately affect all frequency content. It introduces a frequency-aware denoising score that uses discrete wavelet transforms to decompose latent representations into low- and high-frequency subbands and applies selective optimization during editing. The approach enables accurate 2D image edits and extends to 3D texture editing through a frequency-decomposed triplane representation, with quantitative metrics and user studies showing improved detail preservation and color fidelity. By avoiding diffusion-model retraining and providing fine-grained frequency control, the method offers a practical path to more reliable and controllable image and texture edits.
Abstract
Text-guided image editing using Text-to-Image (T2I) models often fails to yield satisfactory results, frequently introducing unintended modifications, such as the loss of local detail and color changes. In this paper, we analyze these failure cases and attribute them to the indiscriminate optimization across all frequency bands, even though only specific frequencies may require adjustment. To address this, we introduce a simple yet effective approach that enables the selective optimization of specific frequency bands within localized spatial regions for precise edits. Our method leverages wavelets to decompose images into different spatial resolutions across multiple frequency bands, enabling precise modifications at various levels of detail. To extend the applicability of our approach, we provide a comparative analysis of different frequency-domain techniques. Additionally, we extend our method to 3D texture editing by performing frequency decomposition on the triplane representation, enabling frequency-aware adjustments for 3D textures. Quantitative evaluations and user studies demonstrate the effectiveness of our method in producing high-quality and precise edits.
