FRRffusion: Unveiling Authenticity with Diffusion-Based Face Retouching Reversal
Fengchuang Xing, Xiaowen Shi, Yuan-Gen Wang, Chunsheng Yang
TL;DR
This work introduces Face Retouching Reversal (FRR) to recover authentic facial appearance from retouched images, addressing a rising risk of deceptive online content. It builds the first FRR dataset, deepFRR, using StyleGAN-generated faces retouched via a commercial API, and presents FRRffusion, a two-stage diffusion-then-transformer framework with a diffusion-based FMAR for coarse structure and a Transformer-based HFDG for high-resolution detail synthesis. Across four evaluation metrics and multiple datasets, FRRffusion consistently outperforms GP-UNIT and Stable Diffusion in both quantitative scores (PSNR, SSIM, VGGS, CLIPS) and qualitative perceptual assessments, including a subjective study with 85 participants. The results suggest FRRffusion effectively bridges FRR with existing restoration tasks and highlights practical potential for authenticity verification in advertising and legal contexts, while also outlining avenues for improvement and robustness against evolving retouching technologies.
Abstract
Unveiling the real appearance of retouched faces to prevent malicious users from deceptive advertising and economic fraud has been an increasing concern in the era of digital economics. This article makes the first attempt to investigate the face retouching reversal (FRR) problem. We first collect an FRR dataset, named deepFRR, which contains 50,000 StyleGAN-generated high-resolution (1024*1024) facial images and their corresponding retouched ones by a commercial online API. To our best knowledge, deepFRR is the first FRR dataset tailored for training the deep FRR models. Then, we propose a novel diffusion-based FRR approach (FRRffusion) for the FRR task. Our FRRffusion consists of a coarse-to-fine two-stage network: A diffusion-based Facial Morpho-Architectonic Restorer (FMAR) is constructed to generate the basic contours of low-resolution faces in the first stage, while a Transformer-based Hyperrealistic Facial Detail Generator (HFDG) is designed to create high-resolution facial details in the second stage. Tested on deepFRR, our FRRffusion surpasses the GP-UNIT and Stable Diffusion methods by a large margin in four widespread quantitative metrics. Especially, the de-retouched images by our FRRffusion are visually much closer to the raw face images than both the retouched face images and those restored by the GP-UNIT and Stable Diffusion methods in terms of qualitative evaluation with 85 subjects. These results sufficiently validate the efficacy of our work, bridging the recently-standing gap between the FRR and generic image restoration tasks. The dataset and code are available at https://github.com/GZHU-DVL/FRRffusion.
