Table of Contents
Fetching ...

Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features

Abdelrahman Abdelwahab, Akshaj Vishnubhatla, Ayaan Vaswani, Advait Bharathulwar, Arnav Kommaraju

TL;DR

The unique multimodal transformer architecture used in this study improves upon previous approaches by using auditory inputs, visual facial micro-expressions, and manually transcribed gesture annotations, moving closer to a reliable non-invasive lie detection model.

Abstract

Inaccuracies in polygraph tests often lead to wrongful convictions, false information, and bias, all of which have significant consequences for both legal and political systems. Recently, analyzing facial micro-expressions has emerged as a method for detecting deception; however, current models have not reached high accuracy and generalizability. The purpose of this study is to aid in remedying these problems. The unique multimodal transformer architecture used in this study improves upon previous approaches by using auditory inputs, visual facial micro-expressions, and manually transcribed gesture annotations, moving closer to a reliable non-invasive lie detection model. Visual and auditory features were extracted using the Vision Transformer and OpenSmile models respectively, which were then concatenated with the transcriptions of participants micro-expressions and gestures. Various models were trained for the classification of lies and truths using these processed and concatenated features. The CNN Conv1D multimodal model achieved an average accuracy of 95.4%. However, further research is still required to create higher-quality datasets and even more generalized models for more diverse applications.

Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features

TL;DR

The unique multimodal transformer architecture used in this study improves upon previous approaches by using auditory inputs, visual facial micro-expressions, and manually transcribed gesture annotations, moving closer to a reliable non-invasive lie detection model.

Abstract

Inaccuracies in polygraph tests often lead to wrongful convictions, false information, and bias, all of which have significant consequences for both legal and political systems. Recently, analyzing facial micro-expressions has emerged as a method for detecting deception; however, current models have not reached high accuracy and generalizability. The purpose of this study is to aid in remedying these problems. The unique multimodal transformer architecture used in this study improves upon previous approaches by using auditory inputs, visual facial micro-expressions, and manually transcribed gesture annotations, moving closer to a reliable non-invasive lie detection model. Visual and auditory features were extracted using the Vision Transformer and OpenSmile models respectively, which were then concatenated with the transcriptions of participants micro-expressions and gestures. Various models were trained for the classification of lies and truths using these processed and concatenated features. The CNN Conv1D multimodal model achieved an average accuracy of 95.4%. However, further research is still required to create higher-quality datasets and even more generalized models for more diverse applications.

Paper Structure

This paper contains 24 sections, 13 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Diagram of Sigmoid function Yang2019
  • Figure 2: shows a diagram for the GCN GeeksGCN2024
  • Figure 3: shows a basic architecture for the GCN GeeksGCN2024
  • Figure 4: Illustration of three consecutive layers in 1D-CNN Ige2024
  • Figure 5: The dataset sample frames, pulled from [6]’s dataset, display hand movements, microfacial expressions, and facial reactions.
  • ...and 8 more figures