Table of Contents
Fetching ...

Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body

Zeqing Wang, Qingyang Ma, Wentao Wan, Haojie Li, Keze Wang, Yonghong Tian

TL;DR

A meticulous framework, named HumanCalibrator, is proposed, which identifies and repairs abnormalities in human body structures while preserving the other content, and achieves high accuracy in abnormality detection and accomplishes an increase in visual comparisons while preserving the other visual content.

Abstract

Recent improvements in visual synthesis have significantly enhanced the depiction of generated human photos, which are pivotal due to their wide applicability and demand. Nonetheless, the existing text-to-image or text-to-video models often generate low-quality human photos that might differ considerably from real-world body structures, referred to as "abnormal human bodies". Such abnormalities, typically deemed unacceptable, pose considerable challenges in the detection and repair of them within human photos. These challenges require precise abnormality recognition capabilities, which entail pinpointing both the location and the abnormality type. Intuitively, Visual Language Models (VLMs) that have obtained remarkable performance on various visual tasks are quite suitable for this task. However, their performance on abnormality detection in human photos is quite poor. Hence, it is quite important to highlight this task for the research community. In this paper, we first introduce a simple yet challenging task, i.e., \textbf{F}ine-grained \textbf{H}uman-body \textbf{A}bnormality \textbf{D}etection \textbf{(FHAD)}, and construct two high-quality datasets for evaluation. Then, we propose a meticulous framework, named HumanCalibrator, which identifies and repairs abnormalities in human body structures while preserving the other content. Experiments indicate that our HumanCalibrator achieves high accuracy in abnormality detection and accomplishes an increase in visual comparisons while preserving the other visual content.

Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body

TL;DR

A meticulous framework, named HumanCalibrator, is proposed, which identifies and repairs abnormalities in human body structures while preserving the other content, and achieves high accuracy in abnormality detection and accomplishes an increase in visual comparisons while preserving the other visual content.

Abstract

Recent improvements in visual synthesis have significantly enhanced the depiction of generated human photos, which are pivotal due to their wide applicability and demand. Nonetheless, the existing text-to-image or text-to-video models often generate low-quality human photos that might differ considerably from real-world body structures, referred to as "abnormal human bodies". Such abnormalities, typically deemed unacceptable, pose considerable challenges in the detection and repair of them within human photos. These challenges require precise abnormality recognition capabilities, which entail pinpointing both the location and the abnormality type. Intuitively, Visual Language Models (VLMs) that have obtained remarkable performance on various visual tasks are quite suitable for this task. However, their performance on abnormality detection in human photos is quite poor. Hence, it is quite important to highlight this task for the research community. In this paper, we first introduce a simple yet challenging task, i.e., \textbf{F}ine-grained \textbf{H}uman-body \textbf{A}bnormality \textbf{D}etection \textbf{(FHAD)}, and construct two high-quality datasets for evaluation. Then, we propose a meticulous framework, named HumanCalibrator, which identifies and repairs abnormalities in human body structures while preserving the other content. Experiments indicate that our HumanCalibrator achieves high accuracy in abnormality detection and accomplishes an increase in visual comparisons while preserving the other visual content.

Paper Structure

This paper contains 26 sections, 8 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Fine-Grained Human-body Abnormality Detection (FHAD). Human body structures in AIGC often exhibit significant deviations from humans existing in the real world, making them easily recognizable as abnormal to human observers. However, current powerful VLMs, typically struggle with this abnormality perception despite excelling in various downstream perceptual tasks, which presents a challenge for fine-grained abnormality detection and motivates our research.
  • Figure 2: Fine-Grained Human-body Abnormality Detection (FHAD) ($\bigstar$) is a novel task. It is distinct from AIGC detection and the assessment of AIGC product quality, as its objective is to identify the abnormality of content generated by AIGC methods in relation to the real world. Additionally, detection at a fine-grained level necessitates methods capable of providing detailed information about the abnormalities and their locations.
  • Figure 3: Examples in AIGC Human-Aware 1K. We manually annotate the abnormalities in frames from generated AIGC videos. Since the location of the abnormalities is ambiguous, we do not annotate the bounding box. Instead, we evaluate the accuracy of the bounding box location by assessing the repair quality.
  • Figure 4: Absent Human-body Detector (AHD) training strategy. In the real world, many objects within the visual content are interconnected, meaning that based on the other objects, one can infer the presence of certain objects in specific locations. Our proposed training strategy leverages the correlation between body parts to facilitate this training process.
  • Figure 5: The illustration of our HumanCalibrator. The HumanCalibrator consists of two parts: perception and regeneration. In the perception stage, HumanCalibrator initially uses an inpainting model to re-generate various bodies based on its overall understanding of human body structure, determining the redundant bodies by comparing semantic differences before and after inpainting. Subsequently, relying on our Absent Human-body Detector (AHD) to assess the perception of absent abnormalities, our HumanCalibrator employs a cyclical strategy to identify absent bodies via AHD. Finally, by the results of the perception stage into the inpainting model as prompts, our HumanCalibrator can repair the detected abnormalities while preserving the visual content of the remaining areas.
  • ...and 9 more figures