Table of Contents
Fetching ...

AFFAKT: A Hierarchical Optimal Transport based Method for Affective Facial Knowledge Transfer in Video Deception Detection

Zihan Ji, Xuetao Tian, Ye Liu

TL;DR

AFFAKT tackles data scarcity in video deception detection by transferring knowledge from a large facial expression recognition dataset through hierarchical optimal transport, producing a transport plan T that maps deception samples to facial expression classes. A momentum-updated correlation prototype B in the SRKB module guides test-time refinement via a sample-specific re weighting of the transferred information, with a combined objective L = L_ce + eta L_ot to align source and target spaces. The approach yields state-of-the-art results on RTL and DOLOS across visual, audio, and fused modalities, and interpretability analyses show deception associations with negative affect such as fear and sadness consistent with psychological theory. Overall, AFFAKT enables robust deception detection under limited labeled data by leveraging cross-domain knowledge transfer via OT and invariant correlation prototypes, with practical implications for real-world screening and analysis.

Abstract

The scarcity of high-quality large-scale labeled datasets poses a huge challenge for employing deep learning models in video deception detection. To address this issue, inspired by the psychological theory on the relation between deception and expressions, we propose a novel method called AFFAKT in this paper, which enhances the classification performance by transferring useful and correlated knowledge from a large facial expression dataset. Two key challenges in knowledge transfer arise: 1) \textit{how much} knowledge of facial expression data should be transferred and 2) \textit{how to} effectively leverage transferred knowledge for the deception classification model during inference. Specifically, the optimal relation mapping between facial expression classes and deception samples is firstly quantified using proposed H-OTKT module and then transfers knowledge from the facial expression dataset to deception samples. Moreover, a correlation prototype within another proposed module SRKB is well designed to retain the invariant correlations between facial expression classes and deception classes through momentum updating. During inference, the transferred knowledge is fine-tuned with the correlation prototype using a sample-specific re-weighting strategy. Experimental results on two deception detection datasets demonstrate the superior performance of our proposed method. The interpretability study reveals high associations between deception and negative affections, which coincides with the theory in psychology.

AFFAKT: A Hierarchical Optimal Transport based Method for Affective Facial Knowledge Transfer in Video Deception Detection

TL;DR

AFFAKT tackles data scarcity in video deception detection by transferring knowledge from a large facial expression recognition dataset through hierarchical optimal transport, producing a transport plan T that maps deception samples to facial expression classes. A momentum-updated correlation prototype B in the SRKB module guides test-time refinement via a sample-specific re weighting of the transferred information, with a combined objective L = L_ce + eta L_ot to align source and target spaces. The approach yields state-of-the-art results on RTL and DOLOS across visual, audio, and fused modalities, and interpretability analyses show deception associations with negative affect such as fear and sadness consistent with psychological theory. Overall, AFFAKT enables robust deception detection under limited labeled data by leveraging cross-domain knowledge transfer via OT and invariant correlation prototypes, with practical implications for real-world screening and analysis.

Abstract

The scarcity of high-quality large-scale labeled datasets poses a huge challenge for employing deep learning models in video deception detection. To address this issue, inspired by the psychological theory on the relation between deception and expressions, we propose a novel method called AFFAKT in this paper, which enhances the classification performance by transferring useful and correlated knowledge from a large facial expression dataset. Two key challenges in knowledge transfer arise: 1) \textit{how much} knowledge of facial expression data should be transferred and 2) \textit{how to} effectively leverage transferred knowledge for the deception classification model during inference. Specifically, the optimal relation mapping between facial expression classes and deception samples is firstly quantified using proposed H-OTKT module and then transfers knowledge from the facial expression dataset to deception samples. Moreover, a correlation prototype within another proposed module SRKB is well designed to retain the invariant correlations between facial expression classes and deception classes through momentum updating. During inference, the transferred knowledge is fine-tuned with the correlation prototype using a sample-specific re-weighting strategy. Experimental results on two deception detection datasets demonstrate the superior performance of our proposed method. The interpretability study reveals high associations between deception and negative affections, which coincides with the theory in psychology.

Paper Structure

This paper contains 26 sections, 13 equations, 5 figures, 7 tables, 2 algorithms.

Figures (5)

  • Figure 1: ACC and loss remain unchanged after 30 epochs.
  • Figure 2: (a) Source features are extracted by a pre-trained encoder in advance. (b) The pipeline of our proposed AFFAKT. Four modules in AFFAKT are in blue background.
  • Figure 3: H-OTKT module. It formulates the relation mapping between source classes and target samples, and then performs knowledge transfer.
  • Figure 4: SRKB module. (a) Training phase: $\mathbf{B}$ is momentum updated to maintain the invariant knowledge of each target class relation with source classes; (b) Testing phase, SRKB module uses the learned $\mathbf{B}$ and sample-specific re-weighting strategy to enhance the detection performance.
  • Figure 5: Sensitive analysis results on RLT dataset under visual modality. (a) Accuracy with different $\xi$, (b) Accuracy with different $\nu$, (c) Accuracy with different $\eta$, and (d) Accuracy with different $\alpha$.