Table of Contents
Fetching ...

Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection

Hongyan Fei, Zexi Jia, Chuanwei Huang, Jinchao Zhang, Jie Zhou

TL;DR

This work presents a physics-informed approach to face forgery detection by exploiting the specular reflection component in the Phong illumination model, which is harder to replicate than ambient or diffuse components. It introduces a fast Retinex-based texture estimation to achieve accurate specular separation and a two-stage cross-attention network (SRI-Net) that fuses specular reflection with face texture and direct light to detect forgeries. The approach achieves state-of-the-art results on traditional benchmarks and demonstrates strong robustness on diffusion-generated forgery datasets (DiFF, DF40). The method offers a principled, generalizable direction for forgery detection that leverages illumination physics beyond conventional spatial/frequency cues.

Abstract

Detecting deepfakes has become increasingly challenging as forgery faces synthesized by AI-generated methods, particularly diffusion models, achieve unprecedented quality and resolution. Existing forgery detection approaches relying on spatial and frequency features demonstrate limited efficacy against high-quality, entirely synthesized forgeries. In this paper, we propose a novel detection method grounded in the observation that facial attributes governed by complex physical laws and multiple parameters are inherently difficult to replicate. Specifically, we focus on illumination, particularly the specular reflection component in the Phong illumination model, which poses the greatest replication challenge due to its parametric complexity and nonlinear formulation. We introduce a fast and accurate face texture estimation method based on Retinex theory to enable precise specular reflection separation. Furthermore, drawing from the mathematical formulation of specular reflection, we posit that forgery evidence manifests not only in the specular reflection itself but also in its relationship with corresponding face texture and direct light. To address this issue, we design the Specular-Reflection-Inconsistency-Network (SRI-Net), incorporating a two-stage cross-attention mechanism to capture these correlations and integrate specular reflection related features with image features for robust forgery detection. Experimental results demonstrate that our method achieves superior performance on both traditional deepfake datasets and generative deepfake datasets, particularly those containing diffusion-generated forgery faces.

Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection

TL;DR

This work presents a physics-informed approach to face forgery detection by exploiting the specular reflection component in the Phong illumination model, which is harder to replicate than ambient or diffuse components. It introduces a fast Retinex-based texture estimation to achieve accurate specular separation and a two-stage cross-attention network (SRI-Net) that fuses specular reflection with face texture and direct light to detect forgeries. The approach achieves state-of-the-art results on traditional benchmarks and demonstrates strong robustness on diffusion-generated forgery datasets (DiFF, DF40). The method offers a principled, generalizable direction for forgery detection that leverages illumination physics beyond conventional spatial/frequency cues.

Abstract

Detecting deepfakes has become increasingly challenging as forgery faces synthesized by AI-generated methods, particularly diffusion models, achieve unprecedented quality and resolution. Existing forgery detection approaches relying on spatial and frequency features demonstrate limited efficacy against high-quality, entirely synthesized forgeries. In this paper, we propose a novel detection method grounded in the observation that facial attributes governed by complex physical laws and multiple parameters are inherently difficult to replicate. Specifically, we focus on illumination, particularly the specular reflection component in the Phong illumination model, which poses the greatest replication challenge due to its parametric complexity and nonlinear formulation. We introduce a fast and accurate face texture estimation method based on Retinex theory to enable precise specular reflection separation. Furthermore, drawing from the mathematical formulation of specular reflection, we posit that forgery evidence manifests not only in the specular reflection itself but also in its relationship with corresponding face texture and direct light. To address this issue, we design the Specular-Reflection-Inconsistency-Network (SRI-Net), incorporating a two-stage cross-attention mechanism to capture these correlations and integrate specular reflection related features with image features for robust forgery detection. Experimental results demonstrate that our method achieves superior performance on both traditional deepfake datasets and generative deepfake datasets, particularly those containing diffusion-generated forgery faces.
Paper Structure (21 sections, 14 equations, 7 figures, 6 tables)

This paper contains 21 sections, 14 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The visualization of (a) Spatial-based face forgery detection methods, (b) Frequency-based face forgery detection methods and (c) Our proposed Specular-Reflection-Inconsistency-Network (SRI-Net). SRI-Net analyzes that specular reflection is more difficult to replicate based on the mathematical form of the general Phong illumination model and contains generalizable forgery evidence.
  • Figure 2: The framework of our proposed face forgery detection method. First, we use 3DDFA to extract 3D shape and propose a fast and accurate Retinex-based method for texture extraction. Next, we employ spherical harmonic model to fit ambient and direct light, extracting specular reflection through a residual based approach under Retinex-based texture constraints. We then propose the Specular-Reflection-Inconsistency-Network (SRI-Net) with a two-stage cross-attention structure to capture correlations among specular reflection, texture, and direct light. Finally, SRI-Net combines these specular reflection related features with image features for final real/fake decision.
  • Figure 3: The visualization of (a) Specular reflection estimation process under different types of texture constraints and (b) Comparison of specular reflection detailed difference.
  • Figure 4: The visualization of Specular Reflection Extraction. The face image can be decomposed into 3D shape, Retinex-based texture, ambient light, direct light, and specular reflection under Phong illumination model constraints. The samples on the left are real samples, while the samples on the right are fake samples.
  • Figure 5: The visualization of misclassified cases. These cases are characterized by extreme facial poses or severe self-occlusion of the facial surface.
  • ...and 2 more figures