Table of Contents
Fetching ...

DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation

Ekta Balkrishna Gavas, Sudipta Banerjee, Chinmay Hegde, Nasir Memon

TL;DR

DiffClean tackles makeup-induced biases in automated age estimation and face verification by deploying a reference-free, text-guided diffusion model. It uses a multi-loss framework (CLIP makeup loss, biometric identity loss, perceptual loss, and age loss) to remove makeup traces while preserving age- and identity-related cues, enabling improved downstream biometric tasks. Across synthetic and real makeup datasets, it achieves notable gains in minor/adult age accuracy and TMR, while maintaining high image quality and generalizing to diverse makeup styles; ablations and fairness analyses support its robustness. The work also demonstrates practical deployment benefits as a pre-processing module for online age verification, with open-source code and thorough analyses of hyperparameters, diversity, and hallucination risk.

Abstract

Accurate age verification can protect underage users from unauthorized access to online platforms and e-commerce sites that provide age-restricted services. However, accurate age estimation can be confounded by several factors, including facial makeup that can induce changes to alter perceived identity and age to fool both humans and machines. In this work, we propose \textsc{DiffClean} which erases makeup traces using a text-guided diffusion model to defend against makeup attacks without requiring any reference image unlike prior work. \textsc{DiffClean} improves age estimation (minor vs. adult accuracy by 5.8\%) and face verification (TMR by 5.1\% at FMR=0.01\%) compared to images with makeup. Our method is: (1) robust across digitally simulated and real-world makeup styles with high visual fidelity, (2) can be easily integrated as a pre-processing module in existing age and identity verification frameworks, and (3) advances the state-of-the art in terms of biometric and perceptual utility. Our codes are available at https://github.com/Ektagavas/DiffClean

DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation

TL;DR

DiffClean tackles makeup-induced biases in automated age estimation and face verification by deploying a reference-free, text-guided diffusion model. It uses a multi-loss framework (CLIP makeup loss, biometric identity loss, perceptual loss, and age loss) to remove makeup traces while preserving age- and identity-related cues, enabling improved downstream biometric tasks. Across synthetic and real makeup datasets, it achieves notable gains in minor/adult age accuracy and TMR, while maintaining high image quality and generalizing to diverse makeup styles; ablations and fairness analyses support its robustness. The work also demonstrates practical deployment benefits as a pre-processing module for online age verification, with open-source code and thorough analyses of hyperparameters, diversity, and hallucination risk.

Abstract

Accurate age verification can protect underage users from unauthorized access to online platforms and e-commerce sites that provide age-restricted services. However, accurate age estimation can be confounded by several factors, including facial makeup that can induce changes to alter perceived identity and age to fool both humans and machines. In this work, we propose \textsc{DiffClean} which erases makeup traces using a text-guided diffusion model to defend against makeup attacks without requiring any reference image unlike prior work. \textsc{DiffClean} improves age estimation (minor vs. adult accuracy by 5.8\%) and face verification (TMR by 5.1\% at FMR=0.01\%) compared to images with makeup. Our method is: (1) robust across digitally simulated and real-world makeup styles with high visual fidelity, (2) can be easily integrated as a pre-processing module in existing age and identity verification frameworks, and (3) advances the state-of-the art in terms of biometric and perceptual utility. Our codes are available at https://github.com/Ektagavas/DiffClean

Paper Structure

This paper contains 24 sections, 6 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Overview of our method, DiffClean, that removes makeup on an example FFHQ image using text-guided diffusion model with a combination of CLIP loss, identity (ID) loss, Age loss and Perceptual (Visual) losses. Successful makeup removal results in shifting the softmax probabilities: $[p(\text{with makeup}), p(\text{without makeup})]$ from $[0.76, 0.24]$ to $[0.13, 0.87]$. Here, we use ViT-B/32 as the CLIP-based classifier. DiffClean successfully maintains biometric and perceptual quality while reducing the overestimated age from 14.6 yrs (with makeup) to 10.4 yrs (without makeup).
  • Figure 2: Synthetic makeup transfer results produced using EleGANt yang2022elegant on example images from UTKFace. Original image (first column), Makeup style reference (second column), and Makeup transferred image (third column).
  • Figure 3: Comparison of makeup removal results generated by six baselines and our proposed DiffClean (last two columns) on three examples images from FFHQ dataset. GAN-based baselines (BeautyGAN, LADN, PSGAN++) introduce visual artifacts, while CLIP2Protect alters hair color and style, DiffAM does not effectively remove makeup, and MAD produces distortions on unseen data.
  • Figure 4: Examples of differential effects of age estimation before and after makeup removal. $1^{st}$ column: DiffClean causes lower predicted ages than with makeup, thus reducing overestimation errors; $2^{nd}$ column: DiffCleanpreserves original age when there is minimal or no makeup; $3^{rd}$ column: DiffClean causes higher predicted ages than with makeup, thus reducing underestimation errors. Results srongly support that DiffClean retains the age of original image if there is no makeup present.
  • Figure 5: (Top): ROC curve with FaceNet. (Middle): ROC curve with MobileFace. (Bottom): Biometric matching in terms of TMR (%) @FMR = 0.01% (higher is better) with FaceNet and MobileFace matchers on FFHQ. Bolded values indicate best results while underlined values indicate second-best results.
  • ...and 6 more figures