Table of Contents
Fetching ...

Towards Generalizable Deepfake Detection via Real Distribution Bias Correction

Ming-Hui Liu, Harry Cheng, Xin Luo, Xin-Shun Xu, Mohan S. Kankanhalli

Abstract

To generalize deepfake detectors to future unseen forgeries, most existing methods attempt to simulate the dynamically evolving forgery types using available source domain data. However, predicting an unbounded set of future manipulations from limited prior examples is infeasible. To overcome this limitation, we propose to exploit the invariance of \textbf{real data} from two complementary perspectives: the fixed population distribution of the entire real class and the inherent Gaussianity of individual real images. Building on these properties, we introduce the Real Distribution Bias Correction (RDBC) framework, which consists of two key components: the Real Population Distribution Estimation module and the Distribution-Sampled Feature Whitening module. The former utilizes the independent and identically distributed (\iid) property of real samples to derive the normal distribution form of their statistics, from which the distribution parameters can be estimated using limited source domain data. Based on the learned population distribution, the latter utilizes the inherent Gaussianity of real data as a discriminative prior and performs a sampling-based whitening operation to amplify the Gaussianity gap between real and fake samples. Through synergistic coupling of the two modules, our model captures the real-world properties of real samples, thereby enhancing its generalizability to unseen target domains. Extensive experiments demonstrate that RDBC achieves state-of-the-art performance in both in-domain and cross-domain deepfake detection.

Towards Generalizable Deepfake Detection via Real Distribution Bias Correction

Abstract

To generalize deepfake detectors to future unseen forgeries, most existing methods attempt to simulate the dynamically evolving forgery types using available source domain data. However, predicting an unbounded set of future manipulations from limited prior examples is infeasible. To overcome this limitation, we propose to exploit the invariance of \textbf{real data} from two complementary perspectives: the fixed population distribution of the entire real class and the inherent Gaussianity of individual real images. Building on these properties, we introduce the Real Distribution Bias Correction (RDBC) framework, which consists of two key components: the Real Population Distribution Estimation module and the Distribution-Sampled Feature Whitening module. The former utilizes the independent and identically distributed (\iid) property of real samples to derive the normal distribution form of their statistics, from which the distribution parameters can be estimated using limited source domain data. Based on the learned population distribution, the latter utilizes the inherent Gaussianity of real data as a discriminative prior and performs a sampling-based whitening operation to amplify the Gaussianity gap between real and fake samples. Through synergistic coupling of the two modules, our model captures the real-world properties of real samples, thereby enhancing its generalizability to unseen target domains. Extensive experiments demonstrate that RDBC achieves state-of-the-art performance in both in-domain and cross-domain deepfake detection.
Paper Structure (16 sections, 1 theorem, 18 equations, 8 figures, 5 tables)

This paper contains 16 sections, 1 theorem, 18 equations, 8 figures, 5 tables.

Key Result

Theorem 1

Given a sufficient number of i.i.d. random variables, the distribution of specific statistics asymptotically converges to a normal distribution.

Figures (8)

  • Figure 1: (a) Direction-agnostic augmentation: the features of fake samples in the source domain are uniformly diffused into the target domain. (b) Direction-aware augmentation: the fake features expand toward specific directions, usually guided by prior knowledge. (c) Systematic Bias: Directional distribution shift induced by hardware constraints (e.g., camera algorithms, sensor noise patterns). (d) Sampling Bias: Distributional shrinkage caused by lack of diversity in demographic attributes (e.g., race, gender, and age).
  • Figure 2: (a) The distribution of the real data in different domains. (b) The different distribution patterns of a specific statistic (e.g., the ideal distribution, the empirical distribution with the known normal form, and the empirical distribution without a predefined distributional form of the mean $\bar{R}$).
  • Figure 3: Our RDBC framework consists of two components: i) Real Population Distribution Estimation Module estimates the distribution form and calculates its parameters based on MLE. ii) Distribution-Sampled Feature Whitening Module takes the Gaussianity as a discriminative cue and enhances the Gaussian discrepancy between real and fake images through a distribution-sampled whitening operation.
  • Figure 4: Comparisons between the residual histograms. Compared with real images, the fake ones exhibit pronounced leptokurtosis and skewness, demonstrating non-Gaussian characteristics.
  • Figure 5: KDE visualization of real image features from different domains. (a) Backbone: The target features deviate severely from the source. (b) RDBC: Scattered features are aligned into a unified distribution using our method.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Theorem 1: Generalized CLT