Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection

Konstantinos Tsigos; Evlampios Apostolidis; Spyridon Baxevanakis; Symeon Papadopoulos; Vasileios Mezaris

Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection

Konstantinos Tsigos, Evlampios Apostolidis, Spyridon Baxevanakis, Symeon Papadopoulos, Vasileios Mezaris

TL;DR

This paper tackles the problem of quantitatively evaluating explainable AI methods for deepfake detection by introducing an adversarial-explanation framework that measures how well explanation-driven perturbations reveal the regions most influential to the detector's decision. The authors deploy a state-of-the-art EfficientNet-based detector trained on FaceForensics++ and compare five explanation methods (Grad-CAM++, RISE, SHAP, LIME, SOBOL) under the new framework. They show that LIME consistently produces the most informative explanations, yielding the largest drop in detection accuracy when perturbing the top highlighted regions, and provide both quantitative and qualitative analyses to support this finding. The work advances trustworthy AI for deepfakes by offering a simple, broadly-applicable evaluation methodology and guiding practitioners in selecting effective explanation methods for detectors in real-world scenarios.

Abstract

In this paper we propose a new framework for evaluating the performance of explanation methods on the decisions of a deepfake detector. This framework assesses the ability of an explanation method to spot the regions of a fake image with the biggest influence on the decision of the deepfake detector, by examining the extent to which these regions can be modified through a set of adversarial attacks, in order to flip the detector's prediction or reduce its initial prediction; we anticipate a larger drop in deepfake detection accuracy and prediction, for methods that spot these regions more accurately. Based on this framework, we conduct a comparative study using a state-of-the-art model for deepfake detection that has been trained on the FaceForensics++ dataset, and five explanation methods from the literature. The findings of our quantitative and qualitative evaluations document the advanced performance of the LIME explanation method against the other compared ones, and indicate this method as the most appropriate for explaining the decisions of the utilized deepfake detector.

Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection

TL;DR

Abstract

Paper Structure (11 sections, 3 figures, 4 tables, 1 algorithm)

This paper contains 11 sections, 3 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Comparative Study Setup
Deepfake detection model
Explanation methods
Evaluation framework and measures
Experiments
Dataset and implementation details
Quantitative results
Qualitative results
Conclusions

Figures (3)

Figure 1: The processing pipeline of the proposed evaluation framework.
Figure 2: The produced explanations by the LIME method (the best performing one according to the results in Section \ref{['sec:experiments']}), for three non-manipulated images of the FaceForensics++ dataset, that were correctly classified as "real".
Figure 3: The obtained visual explanations from the considered explanation methods for four different images of the FaceForensics++ dataset (one per different type of manipulation). In terms of visualization, we adopt the default supported format by each explanation method.

Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection

TL;DR

Abstract

Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (3)