Table of Contents
Fetching ...

Advancing Automated Deception Detection: A Multimodal Approach to Feature Extraction and Analysis

Mohamed Bahaa, Mena Hany, Ehab E. Zakaria

TL;DR

This work tackles automated deception detection in video by building a multimodal feature extraction framework that spans visual, audio, and linguistic cues. It systematically evaluates three model families—LSTM, BiLSTM, and pretrained CNNs—across single, dual, and triple modality configurations, using a courtroom-trial dataset. The results demonstrate that multi-modal fusion yields substantial gains, with triple-modality LSTM achieving up to 99% accuracy, underscoring the value of integrating diverse signals for reliable deception detection. The study highlights the importance of feature engineering for interpretability and provides a solid foundation for future multi-modal deception detectors in security, law, and media contexts.

Abstract

With the exponential increase in video content, the need for accurate deception detection in human-centric video analysis has become paramount. This research focuses on the extraction and combination of various features to enhance the accuracy of deception detection models. By systematically extracting features from visual, audio, and text data, and experimenting with different combinations, we developed a robust model that achieved an impressive 99% accuracy. Our methodology emphasizes the significance of feature engineering in deception detection, providing a clear and interpretable framework. We trained various machine learning models, including LSTM, BiLSTM, and pre-trained CNNs, using both single and multi-modal approaches. The results demonstrated that combining multiple modalities significantly enhances detection performance compared to single modality training. This study highlights the potential of strategic feature extraction and combination in developing reliable and transparent automated deception detection systems in video analysis, paving the way for more advanced and accurate detection methodologies in future research.

Advancing Automated Deception Detection: A Multimodal Approach to Feature Extraction and Analysis

TL;DR

This work tackles automated deception detection in video by building a multimodal feature extraction framework that spans visual, audio, and linguistic cues. It systematically evaluates three model families—LSTM, BiLSTM, and pretrained CNNs—across single, dual, and triple modality configurations, using a courtroom-trial dataset. The results demonstrate that multi-modal fusion yields substantial gains, with triple-modality LSTM achieving up to 99% accuracy, underscoring the value of integrating diverse signals for reliable deception detection. The study highlights the importance of feature engineering for interpretability and provides a solid foundation for future multi-modal deception detectors in security, law, and media contexts.

Abstract

With the exponential increase in video content, the need for accurate deception detection in human-centric video analysis has become paramount. This research focuses on the extraction and combination of various features to enhance the accuracy of deception detection models. By systematically extracting features from visual, audio, and text data, and experimenting with different combinations, we developed a robust model that achieved an impressive 99% accuracy. Our methodology emphasizes the significance of feature engineering in deception detection, providing a clear and interpretable framework. We trained various machine learning models, including LSTM, BiLSTM, and pre-trained CNNs, using both single and multi-modal approaches. The results demonstrated that combining multiple modalities significantly enhances detection performance compared to single modality training. This study highlights the potential of strategic feature extraction and combination in developing reliable and transparent automated deception detection systems in video analysis, paving the way for more advanced and accurate detection methodologies in future research.
Paper Structure (18 sections, 8 equations, 5 figures, 3 tables)

This paper contains 18 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: An example of spontaneous expressions with a truthful response (left) and a deceptive response (right).
  • Figure 2: Our Proposed Frame Work
  • Figure 3: (a) The left panel displays a frontal image of a person's face with several small red dots marking the facial landmarks.(b) The right panel presents a schematic representation of these facial landmarks.
  • Figure 4: Dataset Figure Showing the Dataset.
  • Figure 5: Samples from Real life trial Dataset