Similarity Trajectories: Linking Sampling Process to Artifacts in Diffusion-Generated Images
Dennis Menn, Feng Liang, Hung-Yueh Chiang, Diana Marculescu
TL;DR
This work introduces the Similarity Trajectory, a time-series representation of the similarity between consecutive denoised images during diffusion-model sampling, to detect artifacts with minimal labeled data. By applying a Haar transform and extracting statistical features from time-domain and frequency-domain components, a Random Forest classifier is trained (with auxiliary k-NN probabilities) to predict artifact presence using only 680 labeled images, achieving 72.35% accuracy in 10-fold validation. The approach is grounded in DreamSim for similarity and leverages two diffusion frameworks (SD2 with DDIM and EDM2 with Heun) to define denoising steps, with a formal metric $D_{ ext{max}}$ used to quantify trajectory declines associated with artifacts. Real-world evaluation against human judgments shows substantial alignment, demonstrating the method’s practical potential for artifact detection with scarce data and offering insights into model performance via trajectory consistency.
Abstract
Artifact detection algorithms are crucial to correcting the output generated by diffusion models. However, because of the variety of artifact forms, existing methods require substantial annotated data for training. This requirement limits their scalability and efficiency, which restricts their wide application. This paper shows that the similarity of denoised images between consecutive time steps during the sampling process is related to the severity of artifacts in images generated by diffusion models. Building on this observation, we introduce the concept of Similarity Trajectory to characterize the sampling process and its correlation with the image artifacts presented. Using an annotated data set of 680 images, which is only 0.1% of the amount of data used in the prior work, we trained a classifier on these trajectories to predict the presence of artifacts in images. By performing 10-fold validation testing on the balanced annotated data set, the classifier can achieve an accuracy of 72.35%, highlighting the connection between the Similarity Trajectory and the occurrence of artifacts. This approach enables differentiation between artifact-exhibiting and natural-looking images using limited training data.
