Table of Contents
Fetching ...

Manifold-Optimal Guidance: A Unified Riemannian Control View of Diffusion Guidance

Zexi Jia, Pengcheng Luo, Zhengyao Fang, Jinchao Zhang, Jie Zhou

Abstract

Classifier-Free Guidance (CFG) serves as the de facto control mechanism for conditional diffusion, yet high guidance scales notoriously induce oversaturation, texture artifacts, and structural collapse. We attribute this failure to a geometric mismatch: standard CFG performs Euclidean extrapolation in ambient space, inadvertently driving sampling trajectories off the high-density data manifold. To resolve this, we present Manifold-Optimal Guidance (MOG), a framework that reformulates guidance as a local optimal control problem. MOG yields a closed-form, geometry-aware Riemannian update that corrects off-manifold drift without requiring retraining. Leveraging this perspective, we further introduce Auto-MOG, a dynamic energy-balancing schedule that adaptively calibrates guidance strength, effectively eliminating the need for manual hyperparameter tuning. Extensive validation demonstrates that MOG yields superior fidelity and alignment compared to baselines, with virtually no added computational overhead.

Manifold-Optimal Guidance: A Unified Riemannian Control View of Diffusion Guidance

Abstract

Classifier-Free Guidance (CFG) serves as the de facto control mechanism for conditional diffusion, yet high guidance scales notoriously induce oversaturation, texture artifacts, and structural collapse. We attribute this failure to a geometric mismatch: standard CFG performs Euclidean extrapolation in ambient space, inadvertently driving sampling trajectories off the high-density data manifold. To resolve this, we present Manifold-Optimal Guidance (MOG), a framework that reformulates guidance as a local optimal control problem. MOG yields a closed-form, geometry-aware Riemannian update that corrects off-manifold drift without requiring retraining. Leveraging this perspective, we further introduce Auto-MOG, a dynamic energy-balancing schedule that adaptively calibrates guidance strength, effectively eliminating the need for manual hyperparameter tuning. Extensive validation demonstrates that MOG yields superior fidelity and alignment compared to baselines, with virtually no added computational overhead.
Paper Structure (50 sections, 31 equations, 7 figures, 11 tables, 1 algorithm)

This paper contains 50 sections, 31 equations, 7 figures, 11 tables, 1 algorithm.

Figures (7)

  • Figure 1: Toy spiral manifold illustration of guidance geometry. Left: trajectories on a high-density spiral tube. Standard CFG (red) takes a Euclidean shortcut that leaves the manifold. A projected update (orange) stays closer to the manifold but progresses more slowly. MOG (blue) follows a geometry-aware direction that remains near the manifold while moving efficiently toward the condition. Top right: distance to the manifold over steps, highlighting off-manifold deviation under CFG. Bottom right: conditional energy over steps, showing that MOG reduces energy rapidly without leaving the high-density region.
  • Figure 2: Qualitative comparison across guidance methods. All images are generated with Stable Diffusion XL at guidance scale 15. Standard CFG manifests oversaturation and harsh textures. CFG++ yields washed-out results with low contrast. APG introduces a hazy appearance. In contrast, Auto-MOG achieves the most realistic results.
  • Figure 3: User preference study comparing Auto-MOG against standard CFG. Top row: model-based evaluation. Bottom row: domain-based evaluation. Auto-MOG achieves consistent preference advantages, particularly in Color and Realism.
  • Figure 4: Ablation on guidance scale and sampling steps.
  • Figure 5: Ablation on the Auto-MOG balance factor $\gamma$.
  • ...and 2 more figures