Table of Contents
Fetching ...

DeepFake Detection in Dyadic Video Calls using Point of Gaze Tracking

Odin Kohler, Rahul Vijaykumar, Masudul H. Imtiaz

TL;DR

This paper tackles the risk of real-time deepfakes in one-on-one video calls by introducing a real-time detector based on point-of-gaze tracking, leveraging gaze as a subtle social cue difficult for fakes to mimic. It builds a custom multi-modal dataset, compares real-time deepfake generators, and develops a lightweight 2D CNN using 298 gaze-derived features plus six landmark spectrograms to detect fakes, achieving 82.52% accuracy and 88% ROC. The approach demonstrates the viability of PoG as a biometric signal for deepfake detection in conversational contexts, with open-source code and a dataset to spur further research. Limitations include glare sensitivity, occlusions, and demographic representation, guiding future work toward multi-party calls and additional biometric cues for robustness.

Abstract

With recent advancements in deepfake technology, it is now possible to generate convincing deepfakes in real-time. Unfortunately, malicious actors have started to use this new technology to perform real-time phishing attacks during video meetings. The nature of a video call allows access to what the deepfake is ``seeing,'' that is, the screen displayed to the malicious actor. Using this with the estimated gaze from the malicious actors streamed video enables us to estimate where the deepfake is looking on screen, the point of gaze. Because the point of gaze during conversations is not random and is instead used as a subtle nonverbal communicator, it can be used to detect deepfakes, which are not capable of mimicking this subtle nonverbal communication. This paper proposes a real-time deepfake detection method adapted to this genre of attack, utilizing previously unavailable biometric information. We built our model based on explainable features selected after careful review of research on gaze patterns during dyadic conversations. We then test our model on a novel dataset of our creation, achieving an accuracy of 82\%. This is the first reported method to utilize point-of-gaze tracking for deepfake detection.

DeepFake Detection in Dyadic Video Calls using Point of Gaze Tracking

TL;DR

This paper tackles the risk of real-time deepfakes in one-on-one video calls by introducing a real-time detector based on point-of-gaze tracking, leveraging gaze as a subtle social cue difficult for fakes to mimic. It builds a custom multi-modal dataset, compares real-time deepfake generators, and develops a lightweight 2D CNN using 298 gaze-derived features plus six landmark spectrograms to detect fakes, achieving 82.52% accuracy and 88% ROC. The approach demonstrates the viability of PoG as a biometric signal for deepfake detection in conversational contexts, with open-source code and a dataset to spur further research. Limitations include glare sensitivity, occlusions, and demographic representation, guiding future work toward multi-party calls and additional biometric cues for robustness.

Abstract

With recent advancements in deepfake technology, it is now possible to generate convincing deepfakes in real-time. Unfortunately, malicious actors have started to use this new technology to perform real-time phishing attacks during video meetings. The nature of a video call allows access to what the deepfake is ``seeing,'' that is, the screen displayed to the malicious actor. Using this with the estimated gaze from the malicious actors streamed video enables us to estimate where the deepfake is looking on screen, the point of gaze. Because the point of gaze during conversations is not random and is instead used as a subtle nonverbal communicator, it can be used to detect deepfakes, which are not capable of mimicking this subtle nonverbal communication. This paper proposes a real-time deepfake detection method adapted to this genre of attack, utilizing previously unavailable biometric information. We built our model based on explainable features selected after careful review of research on gaze patterns during dyadic conversations. We then test our model on a novel dataset of our creation, achieving an accuracy of 82\%. This is the first reported method to utilize point-of-gaze tracking for deepfake detection.

Paper Structure

This paper contains 22 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Structure of Dataset.
  • Figure 2: Example of gaze vector obtained with MPIIFaceGaze.
  • Figure 3: Distribution of distances from PoG to tip of nose in pixels, sorted by speaking status.
  • Figure 4: Average PSD of each class computed by taking the average of each frequency after calculating the PSD from every 1800 frame segment with a nperseg of 90
  • Figure 5: The six facial landmarks used to generate features
  • ...and 3 more figures