Table of Contents
Fetching ...

Bridging Diffusion Guidance and Anderson Acceleration via Hopfield Dynamics

Kwanyoung Kim

TL;DR

This work establishes a foundational framework for attention-space extrapolation by modeling attention dynamics as fixed-point iterations within Modern Hopfield Networks and proposes Geometry Aware Attention Guidance (GAG), a plug-and-play method that seamlessly integrates with existing frameworks while significantly improving generation quality.

Abstract

Classifier-Free Guidance (CFG) has significantly enhanced the generative quality of diffusion models by extrapolating between conditional and unconditional outputs. However, its high inference cost and limited applicability to distilled or single-step models have shifted research focus toward attention-space extrapolation. While these methods offer computational efficiency, their theoretical underpinnings remain elusive. In this work, we establish a foundational framework for attention-space extrapolation by modeling attention dynamics as fixed-point iterations within Modern Hopfield Networks. We demonstrate that the extrapolation effect in attention space constitutes a special case of Anderson Acceleration applied to these dynamics. Building on this insight and the weak contraction property, we propose Geometry Aware Attention Guidance (GAG). By decomposing attention updates into parallel and orthogonal components relative to the guidance direction, GAG stabilizes the acceleration process and maximizes guidance efficiency. Our plug-and-play method seamlessly integrates with existing frameworks while significantly improving generation quality.

Bridging Diffusion Guidance and Anderson Acceleration via Hopfield Dynamics

TL;DR

This work establishes a foundational framework for attention-space extrapolation by modeling attention dynamics as fixed-point iterations within Modern Hopfield Networks and proposes Geometry Aware Attention Guidance (GAG), a plug-and-play method that seamlessly integrates with existing frameworks while significantly improving generation quality.

Abstract

Classifier-Free Guidance (CFG) has significantly enhanced the generative quality of diffusion models by extrapolating between conditional and unconditional outputs. However, its high inference cost and limited applicability to distilled or single-step models have shifted research focus toward attention-space extrapolation. While these methods offer computational efficiency, their theoretical underpinnings remain elusive. In this work, we establish a foundational framework for attention-space extrapolation by modeling attention dynamics as fixed-point iterations within Modern Hopfield Networks. We demonstrate that the extrapolation effect in attention space constitutes a special case of Anderson Acceleration applied to these dynamics. Building on this insight and the weak contraction property, we propose Geometry Aware Attention Guidance (GAG). By decomposing attention updates into parallel and orthogonal components relative to the guidance direction, GAG stabilizes the acceleration process and maximizes guidance efficiency. Our plug-and-play method seamlessly integrates with existing frameworks while significantly improving generation quality.
Paper Structure (26 sections, 4 theorems, 22 equations, 3 figures, 3 tables)

This paper contains 26 sections, 4 theorems, 22 equations, 3 figures, 3 tables.

Key Result

Theorem 3.1

hopfieldsparsehopstanhop (Generalized Sparse Hopfield Retrieval Dynamics). The retrieval dynamics of the generalized sparse Hopfield model is a monotonic one-step update:

Figures (3)

  • Figure 1: Qualitative comparison with and without our proposed method. (Top): Various guidance sampling methods, such as CFG CFG and APG apg. (Middle): Step-distilled models (e.g., Hyper-SDXL hyper and DMD2 dmd2). (Bottom): Additional backbone architecture (Flux.1 flux2024). Our method is compatible with various guidance methods, distilled models, and different architectures without requiring additional training or computational overhead.
  • Figure 2: Qualitative impact of geometric components. Orthogonal-only fails to recover semantic structure; Full Residual shows reduced fidelity due to interference; Parallel-only ($\zeta$=0) yields the highest quality, confirming the parallel component as the primary acceleration signal.
  • Figure 3: The Analysis of Guidance Scale $\lambda$.

Theorems & Definitions (9)

  • Theorem 3.1
  • Proposition 3.2
  • Lemma 3.3: Fixed-Point Preservation
  • proof
  • Remark 3.4: Divergence in Non-Common Fixed Points
  • Definition 3.5: Geometric Decomposition of Residuals
  • Theorem 3.7: Asymptotic Convergence of Orthogonal Error
  • proof
  • Remark 3.8: Stability and Robustness