Table of Contents
Fetching ...

Effects of Soft-Domain Transfer and Named Entity Information on Deception Detection

Steven Triplett, Simon Minami, Rakesh Verma

TL;DR

Eight datasets were utilized from various domains to evaluate their effect on classifier performance when combined with transfer learning via intermediate layer concatenation of fine-tuned BERT models and find improvements in accuracy over the baseline.

Abstract

In the modern age an enormous amount of communication occurs online, and it is difficult to know when something written is genuine or deceitful. There are many reasons for someone to deceive online (e.g., monetary gain, political gain) and detecting this behavior without any physical interaction is a difficult task. Additionally, deception occurs in several text-only domains and it is unclear if these various sources can be leveraged to improve detection. To address this, eight datasets were utilized from various domains to evaluate their effect on classifier performance when combined with transfer learning via intermediate layer concatenation of fine-tuned BERT models. We find improvements in accuracy over the baseline. Furthermore, we evaluate multiple distance measurements between datasets and find that Jensen-Shannon distance correlates moderately with transfer learning performance. Finally, the impact was evaluated of multiple methods, which produce additional information in a dataset's text via named entities, on BERT performance and we find notable improvement in accuracy of up to 11.2%.

Effects of Soft-Domain Transfer and Named Entity Information on Deception Detection

TL;DR

Eight datasets were utilized from various domains to evaluate their effect on classifier performance when combined with transfer learning via intermediate layer concatenation of fine-tuned BERT models and find improvements in accuracy over the baseline.

Abstract

In the modern age an enormous amount of communication occurs online, and it is difficult to know when something written is genuine or deceitful. There are many reasons for someone to deceive online (e.g., monetary gain, political gain) and detecting this behavior without any physical interaction is a difficult task. Additionally, deception occurs in several text-only domains and it is unclear if these various sources can be leveraged to improve detection. To address this, eight datasets were utilized from various domains to evaluate their effect on classifier performance when combined with transfer learning via intermediate layer concatenation of fine-tuned BERT models. We find improvements in accuracy over the baseline. Furthermore, we evaluate multiple distance measurements between datasets and find that Jensen-Shannon distance correlates moderately with transfer learning performance. Finally, the impact was evaluated of multiple methods, which produce additional information in a dataset's text via named entities, on BERT performance and we find notable improvement in accuracy of up to 11.2%.

Paper Structure

This paper contains 19 sections, 4 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Distribution of Classes for all Deception Datasets
  • Figure 2: Term 1 of $DQI_{C1}$ for all datasets
  • Figure 3: Term 2 of $DQI_{C1}$ for all datasets
  • Figure 4: Term 3 of $DQI_{C1}$ for all datasets
  • Figure 5: Intermediate Layer Concatenation Process
  • ...and 6 more figures