Table of Contents
Fetching ...

Similarity Trajectories: Linking Sampling Process to Artifacts in Diffusion-Generated Images

Dennis Menn, Feng Liang, Hung-Yueh Chiang, Diana Marculescu

TL;DR

This work introduces the Similarity Trajectory, a time-series representation of the similarity between consecutive denoised images during diffusion-model sampling, to detect artifacts with minimal labeled data. By applying a Haar transform and extracting statistical features from time-domain and frequency-domain components, a Random Forest classifier is trained (with auxiliary k-NN probabilities) to predict artifact presence using only 680 labeled images, achieving 72.35% accuracy in 10-fold validation. The approach is grounded in DreamSim for similarity and leverages two diffusion frameworks (SD2 with DDIM and EDM2 with Heun) to define denoising steps, with a formal metric $D_{ ext{max}}$ used to quantify trajectory declines associated with artifacts. Real-world evaluation against human judgments shows substantial alignment, demonstrating the method’s practical potential for artifact detection with scarce data and offering insights into model performance via trajectory consistency.

Abstract

Artifact detection algorithms are crucial to correcting the output generated by diffusion models. However, because of the variety of artifact forms, existing methods require substantial annotated data for training. This requirement limits their scalability and efficiency, which restricts their wide application. This paper shows that the similarity of denoised images between consecutive time steps during the sampling process is related to the severity of artifacts in images generated by diffusion models. Building on this observation, we introduce the concept of Similarity Trajectory to characterize the sampling process and its correlation with the image artifacts presented. Using an annotated data set of 680 images, which is only 0.1% of the amount of data used in the prior work, we trained a classifier on these trajectories to predict the presence of artifacts in images. By performing 10-fold validation testing on the balanced annotated data set, the classifier can achieve an accuracy of 72.35%, highlighting the connection between the Similarity Trajectory and the occurrence of artifacts. This approach enables differentiation between artifact-exhibiting and natural-looking images using limited training data.

Similarity Trajectories: Linking Sampling Process to Artifacts in Diffusion-Generated Images

TL;DR

This work introduces the Similarity Trajectory, a time-series representation of the similarity between consecutive denoised images during diffusion-model sampling, to detect artifacts with minimal labeled data. By applying a Haar transform and extracting statistical features from time-domain and frequency-domain components, a Random Forest classifier is trained (with auxiliary k-NN probabilities) to predict artifact presence using only 680 labeled images, achieving 72.35% accuracy in 10-fold validation. The approach is grounded in DreamSim for similarity and leverages two diffusion frameworks (SD2 with DDIM and EDM2 with Heun) to define denoising steps, with a formal metric used to quantify trajectory declines associated with artifacts. Real-world evaluation against human judgments shows substantial alignment, demonstrating the method’s practical potential for artifact detection with scarce data and offering insights into model performance via trajectory consistency.

Abstract

Artifact detection algorithms are crucial to correcting the output generated by diffusion models. However, because of the variety of artifact forms, existing methods require substantial annotated data for training. This requirement limits their scalability and efficiency, which restricts their wide application. This paper shows that the similarity of denoised images between consecutive time steps during the sampling process is related to the severity of artifacts in images generated by diffusion models. Building on this observation, we introduce the concept of Similarity Trajectory to characterize the sampling process and its correlation with the image artifacts presented. Using an annotated data set of 680 images, which is only 0.1% of the amount of data used in the prior work, we trained a classifier on these trajectories to predict the presence of artifacts in images. By performing 10-fold validation testing on the balanced annotated data set, the classifier can achieve an accuracy of 72.35%, highlighting the connection between the Similarity Trajectory and the occurrence of artifacts. This approach enables differentiation between artifact-exhibiting and natural-looking images using limited training data.

Paper Structure

This paper contains 20 sections, 15 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Comparison of images exhibiting strong artifacts (left) versus a more natural appearance, alongside their corresponding Similarity Trajectories. The image on the left displays pronounced artifacts, particularly in the circled area where the subject's face blends unnaturally with their hair. This is reflected in the Similarity Trajectory, which is more erratic and shows a significant drop, as indicated by the red arrows. In contrast, the right image appears more natural, with a smoother Similarity Trajectory that exhibits consistency. The prompt for the images is "A student walking in front of the UT tower, with one hand holding a calculus book."
  • Figure 2: Flowchart illustrating the methodology for training a Random Forest (RF) classifier to detect artifacts in images based on the Similarity Trajectory. The process involves: (1) generating images and recording the denoised images $x_0^{(t)}$ at each time step; (2) calculating the similarity between consecutive denoised images $x_0^{(t)}$ and $x_0^{(t+1)}$ to construct the Similarity Trajectory; (3) applying Haar transform to the Similarity Trajectory to obtain sets of detailed coefficients and dividing the original Similarity Trajectory into time-domain trajectory sets; (4) performing feature engineering by extracting statistical properties from each set; and (5) using the extracted features to train the RF classifier for classifying the presence of artifacts in the generated images.
  • Figure 3: Artifact formation in the sampling process. The denoised images $x_0$ at various time steps illustrate that changes in the diffusion model's predictions between consecutive steps can cause overlapping objects. This overlap may distort the original shapes, leading to the presence of artifacts. The prompt for the image is "A man in a jacket and cowboy hat and a person on a horse".
  • Figure 4: Average Gini impurity reduction at each time step from the RF Classifier. We input raw Similarity Trajectory, without any transformations, into the RF classifier and analyze the average Gini impurity reduction.
  • Figure 5: Comparison of averaged trajectories under different conditions. (a) Effect of training progress on the same model, showing increased latent consistency over time. (b) Influence of model size on fully trained models, with larger models exhibiting greater adjacent latent similarity.