DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior

Yiming Zhang; Zhe Wang; Xinjie Li; Yunchen Yuan; Chengsong Zhang; Xiao Sun; Zhihang Zhong; Jian Wang

DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior

Yiming Zhang, Zhe Wang, Xinjie Li, Yunchen Yuan, Chengsong Zhang, Xiao Sun, Zhihang Zhong, Jian Wang

TL;DR

DiffBody addresses artifacts in human body restoration that arise when applying general restoration models to portraits and body images. It introduces a body-aware diffusion framework that integrates pose and attention priors, text guidance via GPT-4V, and a body-part aware diffusion sampler, trained on a new 140k-image dataset assembled from SHHQ, DeepFashion, and Web-human sources. The method achieves superior performance on quantitative metrics (e.g., SSIM, LPIPS, MANIQA, CLIPIQA) and qualitative assessments, including a user study, outperforming state-of-the-art baselines. This work enhances practical human body restoration with structured multimodal conditioning and opens avenues for more nuanced control and identity-preserving restorations.

Abstract

Human body restoration plays a vital role in various applications related to the human body. Despite recent advances in general image restoration using generative models, their performance in human body restoration remains mediocre, often resulting in foreground and background blending, over-smoothing surface textures, missing accessories, and distorted limbs. Addressing these challenges, we propose a novel approach by constructing a human body-aware diffusion model that leverages domain-specific knowledge to enhance performance. Specifically, we employ a pretrained body attention module to guide the diffusion model's focus on the foreground, addressing issues caused by blending between the subject and background. We also demonstrate the value of revisiting the language modality of the diffusion model in restoration tasks by seamlessly incorporating text prompt to improve the quality of surface texture and additional clothing and accessories details. Additionally, we introduce a diffusion sampler tailored for fine-grained human body parts, utilizing local semantic information to rectify limb distortions. Lastly, we collect a comprehensive dataset for benchmarking and advancing the field of human body restoration. Extensive experimental validation showcases the superiority of our approach, both quantitatively and qualitatively, over existing methods.

DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior

TL;DR

Abstract

Paper Structure (19 sections, 6 equations, 17 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 6 equations, 17 figures, 1 table, 1 algorithm.

Introduction
Related Work
Blind Image Restoration
Controllable Human Image Generation
Datasets for Human Image Generation
Methodology
Preliminary
Enhancing Human Image Restoration through Structural Guidance
Leveraging Textual Information for Image Restoration
Human-centric Guidance for Diffusion Sampling
Experiment
Datasets
Experimental Details
Comparisons with State-of-the-Art Methods
Ablation Studies
...and 4 more sections

Figures (17)

Figure 1: Comparison between our model and baseline (Left: Baseline, Right: Ours, Top left corner: LQ input). Comparing to baseline, our model has better performance on problems labeled below each image.
Figure 1: Detailed prompt we provide to GPT-4V to caption our dataset.
Figure 2: The structure of DiffBody. First, we train the SwinIR model using our proposed dataset and process the low-quality image $I_{LQ}$ to obtain preliminary restored image $I_{reg}$ with the trained model. In addition, pose map $I_{pose}$ and attention map $I_{attn}$ are extracted from $I_{reg}$ using existing methods. Afterwards, $I_{reg}$ and $I_{pose}$ are passed into the pre-trained VAE Encoder, then concatenated together with $I_{attn}$ and fed to the trainable copy of SD Encoder. Additionally, we also utilize the textual information (Sec 3.3) and a novel human-centric sampling (Sec 3.4) to enhance the restoration capability. Please see corresponding sections for details.
Figure 2: Visual comparison of DiffBody and other general SOTA methods. Compared to other methods, our model is more effective in generating detailed limbs.
Figure 3: During training, texts in black are fed to the model. Texts in green reflect the generative logic of GPT-4V in captioning images.
...and 12 more figures

DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior

TL;DR

Abstract

DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior

Authors

TL;DR

Abstract

Table of Contents

Figures (17)