FreSca: Scaling in Frequency Space Enhances Diffusion Models
Chao Huang, Susan Liang, Yunlong Tang, Jing Bi, Li Ma, Yapeng Tian, Chenliang Xu
TL;DR
FreSca addresses fine-grained control in latent diffusion models by leveraging frequency-domain manipulation of the classifier-free guidance noise difference $\Delta\epsilon_t$. It analyzes frequency representations across pixel and latent spaces to identify $\Delta\epsilon_t$ as a semantically rich target, and then decomposes it into low- and high-frequency components with flexible cutoffs and independent scales $l$ and $h$. The framework is model- and task-agnostic, enabling plug-in use across SDXL, SD3, depth estimation, editing, and video synthesis without retraining. Empirically, FreSca improves generation quality across multiple tasks, demonstrating broad applicability and practical impact.
Abstract
Latent diffusion models (LDMs) have achieved remarkable success in a variety of image tasks, yet achieving fine-grained, disentangled control over global structures versus fine details remains challenging. This paper explores frequency-based control within latent diffusion models. We first systematically analyze frequency characteristics across pixel space, VAE latent space, and internal LDM representations. This reveals that the "noise difference" term, derived from classifier-free guidance at each step t, is a uniquely effective and semantically rich target for manipulation. Building on this insight, we introduce FreSca, a novel and plug-and-play framework that decomposes noise difference into low- and high-frequency components and applies independent scaling factors to them via spatial or energy-based cutoffs. Essentially, FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control. We demonstrate its versatility and effectiveness in improving generation quality and structural emphasis on multiple architectures (e.g., SD3, SDXL) and across applications including image generation, editing, depth estimation, and video synthesis, thereby unlocking a new dimension of expressive control within LDMs.
