Table of Contents
Fetching ...

GeoGuide: Geometric guidance of diffusion models

Mateusz Poleski, Jacek Tabor, Przemysław Spurek

TL;DR

GeoGuide introduces a metric-based, norm-controlled guidance strategy for diffusion models to generate data from unseen classes, addressing the quality gap observed with ADM-G. By enforcing fixed-length updates $A_t= rac{\sqrt{D}}{T}\frac{\nabla p(y|x)}{\|\nabla p(y|x)\|}$, GeoGuide keeps the denoising trajectory close to the data manifold, improving both fidelity and detail. Empirical results on ImageNet 256x256 show substantial FID improvements over ADM-G in unconditional ($7.32$ vs $12$) and conditional settings ($4.06$ vs $4.59$), and combining GeoGuide with a robust classifier yields further gains. The method is simple to implement, compatible with existing diffusion models, and demonstrates strong practical potential for high-quality, unlabeled-data-guided image synthesis.

Abstract

Diffusion models are among the most effective methods for image generation. This is in particular because, unlike GANs, they can be easily conditioned during training to produce elements with desired class or properties. However, guiding a pre-trained diffusion model to generate elements from previously unlabeled data is significantly more challenging. One of the possible solutions was given by the ADM-G guiding approach. Although ADM-G successfully generates elements from the given class, there is a significant quality gap compared to a model originally conditioned on this class. In particular, the FID score obtained by the ADM-G-guided diffusion model is nearly three times lower than the class-conditioned guidance. We demonstrate that this issue is partly due to ADM-G providing minimal guidance during the final stage of the denoising process. To address this problem, we propose GeoGuide, a guidance model based on tracing the distance of the diffusion model's trajectory from the data manifold. The main idea of GeoGuide is to produce normalized adjustments during the backward denoising process. As shown in the experiments, GeoGuide surpasses the probabilistic approach ADM-G with respect to both the FID scores and the quality of the generated images.

GeoGuide: Geometric guidance of diffusion models

TL;DR

GeoGuide introduces a metric-based, norm-controlled guidance strategy for diffusion models to generate data from unseen classes, addressing the quality gap observed with ADM-G. By enforcing fixed-length updates , GeoGuide keeps the denoising trajectory close to the data manifold, improving both fidelity and detail. Empirical results on ImageNet 256x256 show substantial FID improvements over ADM-G in unconditional ( vs ) and conditional settings ( vs ), and combining GeoGuide with a robust classifier yields further gains. The method is simple to implement, compatible with existing diffusion models, and demonstrates strong practical potential for high-quality, unlabeled-data-guided image synthesis.

Abstract

Diffusion models are among the most effective methods for image generation. This is in particular because, unlike GANs, they can be easily conditioned during training to produce elements with desired class or properties. However, guiding a pre-trained diffusion model to generate elements from previously unlabeled data is significantly more challenging. One of the possible solutions was given by the ADM-G guiding approach. Although ADM-G successfully generates elements from the given class, there is a significant quality gap compared to a model originally conditioned on this class. In particular, the FID score obtained by the ADM-G-guided diffusion model is nearly three times lower than the class-conditioned guidance. We demonstrate that this issue is partly due to ADM-G providing minimal guidance during the final stage of the denoising process. To address this problem, we propose GeoGuide, a guidance model based on tracing the distance of the diffusion model's trajectory from the data manifold. The main idea of GeoGuide is to produce normalized adjustments during the backward denoising process. As shown in the experiments, GeoGuide surpasses the probabilistic approach ADM-G with respect to both the FID scores and the quality of the generated images.
Paper Structure (18 sections, 22 equations, 11 figures, 5 tables, 2 algorithms)

This paper contains 18 sections, 22 equations, 11 figures, 5 tables, 2 algorithms.

Figures (11)

  • Figure 1: In our paper, we propose a new approach to guidance of diffusion models, called GeoGuide. In contrast to ADM-G dhariwal2021diffusion we use updates with the same norm and consequently keep the guided diffusion process close to the manifold of a given class. Observe that this allows us to construct images with more details characteristic of a given class, resulting in a decrease in the FID score from 12 in ADM-G to 7.32 in GeoGuide, see Table \ref{['tab:our_resuls_comparison']}. The images were constructed with the same diffusion noise for ADM-G and GeoGuide.
  • Figure 2: Norm values of the gradient modification factor applied at each iteration of the classifier guided diffusion sampling backward process. Comparison of image generated with GeoGuide and three random images generated with ADM-G. Observe that in the case of the vanilla guidance (probabilistic approach) the norm of the modification at the last steps of the diffusion process is close to zero, which results in less detail in the produced images, see Figure \ref{['fig:samples_comparison_uncond_teaser']}.
  • Figure 3: Comparison of results when guidance is turned off after first 30% of iterations vs fully guided samples. ADM-G is not effective during last 70% of iterations, whereas GeoGuide is still significantly improving quality of generated images.
  • Figure 4: Images generated by guided diffusion using the same noise (random seed) and class label, with a vanilla dhariwal2021diffusion (FID 12.00, top) and a geometric (FID 7.32, bottom) guidance. Observe that images generated by GeoGuide are typically much more detailed. In our opinion, this is because the role of the classifier gradient is also important at the end of the backward process. In the ADM-G, the norm of the modification at the last steps of the process is close to zero, while in GeoGuide it stays relevant during the entire process, see Figure \ref{['fig:modification-norm']}
  • Figure 5: Samples with vanilla classifier guidance dhariwal2021diffusion (FID 12.00, left) vs samples with GeoGuide (FID 7.32, middle) and samples from the training set (right). Distribution of generated samples using both guidance methods is comparable, but significantly narrower compared to samples from original dataset.
  • ...and 6 more figures