Detecting Human Artifacts from Text-to-Image Models
Kaihong Wang, Lingzhi Zhang, Jianming Zhang
TL;DR
This work tackles human-related artifacts in text-to-image generation by creating the Human Artifact Dataset (HAD) and training specialized detectors (HADM) for local and global artifacts. The authors demonstrate that HADM generalizes across unseen generators and can guide diffusion model finetuning and automated inpainting to reduce artifacts, improving human structural coherence. They further validate the approach with extensive experiments, ablations, and a user study, and release both the dataset and models for broad use. The work offers a practical feedback loop to enhance synthetic image quality and provides robust benchmarks for evaluating human artifacts across diverse T2I models.
Abstract
Despite recent advancements, text-to-image generation models often produce images containing artifacts, especially in human figures. These artifacts appear as poorly generated human bodies, including distorted, missing, or extra body parts, leading to visual inconsistencies with typical human anatomy and greatly impairing overall fidelity. In this study, we address this challenge by curating Human Artifact Dataset (HAD), a diverse dataset specifically designed to localize human artifacts. HAD comprises over 37,000 images generated by several popular text-to-image models, annotated for human artifact localization. Using this dataset, we train the Human Artifact Detection Models (HADM), which can identify different artifacts across multiple generative domains and demonstrate strong generalization, even on images from unseen generators. Additionally, to further improve generators' perception of human structural coherence, we use the predictions from our HADM as feedback for diffusion model finetuning. Our experiments confirm a reduction in human artifacts in the resulting model. Furthermore, we showcase a novel application of our HADM in an iterative inpainting framework to correct human artifacts in arbitrary images directly, demonstrating its utility in improving image quality. Our dataset and detection models are available at: https://github.com/wangkaihong/HADM.
