AFFAKT: A Hierarchical Optimal Transport based Method for Affective Facial Knowledge Transfer in Video Deception Detection
Zihan Ji, Xuetao Tian, Ye Liu
TL;DR
AFFAKT tackles data scarcity in video deception detection by transferring knowledge from a large facial expression recognition dataset through hierarchical optimal transport, producing a transport plan T that maps deception samples to facial expression classes. A momentum-updated correlation prototype B in the SRKB module guides test-time refinement via a sample-specific re weighting of the transferred information, with a combined objective L = L_ce + eta L_ot to align source and target spaces. The approach yields state-of-the-art results on RTL and DOLOS across visual, audio, and fused modalities, and interpretability analyses show deception associations with negative affect such as fear and sadness consistent with psychological theory. Overall, AFFAKT enables robust deception detection under limited labeled data by leveraging cross-domain knowledge transfer via OT and invariant correlation prototypes, with practical implications for real-world screening and analysis.
Abstract
The scarcity of high-quality large-scale labeled datasets poses a huge challenge for employing deep learning models in video deception detection. To address this issue, inspired by the psychological theory on the relation between deception and expressions, we propose a novel method called AFFAKT in this paper, which enhances the classification performance by transferring useful and correlated knowledge from a large facial expression dataset. Two key challenges in knowledge transfer arise: 1) \textit{how much} knowledge of facial expression data should be transferred and 2) \textit{how to} effectively leverage transferred knowledge for the deception classification model during inference. Specifically, the optimal relation mapping between facial expression classes and deception samples is firstly quantified using proposed H-OTKT module and then transfers knowledge from the facial expression dataset to deception samples. Moreover, a correlation prototype within another proposed module SRKB is well designed to retain the invariant correlations between facial expression classes and deception classes through momentum updating. During inference, the transferred knowledge is fine-tuned with the correlation prototype using a sample-specific re-weighting strategy. Experimental results on two deception detection datasets demonstrate the superior performance of our proposed method. The interpretability study reveals high associations between deception and negative affections, which coincides with the theory in psychology.
