AID: Attention Interpolation of Text-to-Image Diffusion
Qiyuan He, Jinghao Wang, Ziwei Liu, Angela Yao
TL;DR
This work tackles conditional interpolation in text-to-image diffusion models, a challenging task when multiple text conditions must blend smoothly and coherently. It introduces AID, a training-free framework that improves interpolation by (i) applying fused inner/outer interpolated attention to both cross- and self-attention, (ii) using a Beta-distribution prior to non-uniformly sample interpolation points for smoother transitions, and (iii) augmenting with PAID to guide interpolation paths via a user-provided prompt. The approach yields substantial gains in consistency, smoothness, and fidelity across benchmarks and downstream tasks, including image editing control and compositional generation, without model training. The key contributions include a formal analysis of TEI failures, a practical fused-attention interpolation mechanism, a Beta-prior sampling strategy, and a prompt-guided extension enabling explicit path control, all demonstrated through comprehensive quantitative and human studies.
Abstract
Conditional diffusion models can create unseen images in various settings, aiding image interpolation. Interpolation in latent spaces is well-studied, but interpolation with specific conditions like text or poses is less understood. Simple approaches, such as linear interpolation in the space of conditions, often result in images that lack consistency, smoothness, and fidelity. To that end, we introduce a novel training-free technique named Attention Interpolation via Diffusion (AID). Our key contributions include 1) proposing an inner/outer interpolated attention layer; 2) fusing the interpolated attention with self-attention to boost fidelity; and 3) applying beta distribution to selection to increase smoothness. We also present a variant, Prompt-guided Attention Interpolation via Diffusion (PAID), that considers interpolation as a condition-dependent generative process. This method enables the creation of new images with greater consistency, smoothness, and efficiency, and offers control over the exact path of interpolation. Our approach demonstrates effectiveness for conceptual and spatial interpolation. Code and demo are available at https://github.com/QY-H00/attention-interpolation-diffusion.
