Table of Contents
Fetching ...

Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference

Jianxing Yu, Shiqi Wang, Han Yin, Zhenlong Sun, Ruobing Xie, Bo Zhang, Yanghui Rao

TL;DR

A new debiased method based on causal inference is proposed to detect clickbait posts on the Web and can use invariant and causal factors to build a robust model with good generalization ability.

Abstract

This paper focuses on detecting clickbait posts on the Web. These posts often use eye-catching disinformation in mixed modalities to mislead users to click for profit. That affects the user experience and thus would be blocked by content provider. To escape detection, malicious creators use tricks to add some irrelevant non-bait content into bait posts, dressing them up as legal to fool the detector. This content often has biased relations with non-bait labels, yet traditional detectors tend to make predictions based on simple co-occurrence rather than grasping inherent factors that lead to malicious behavior. This spurious bias would easily cause misjudgments. To address this problem, we propose a new debiased method based on causal inference. We first employ a set of features in multiple modalities to characterize the posts. Considering these features are often mixed up with unknown biases, we then disentangle three kinds of latent factors from them, including the invariant factor that indicates intrinsic bait intention; the causal factor which reflects deceptive patterns in a certain scenario, and non-causal noise. By eliminating the noise that causes bias, we can use invariant and causal factors to build a robust model with good generalization ability. Experiments on three popular datasets show the effectiveness of our approach.

Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference

TL;DR

A new debiased method based on causal inference is proposed to detect clickbait posts on the Web and can use invariant and causal factors to build a robust model with good generalization ability.

Abstract

This paper focuses on detecting clickbait posts on the Web. These posts often use eye-catching disinformation in mixed modalities to mislead users to click for profit. That affects the user experience and thus would be blocked by content provider. To escape detection, malicious creators use tricks to add some irrelevant non-bait content into bait posts, dressing them up as legal to fool the detector. This content often has biased relations with non-bait labels, yet traditional detectors tend to make predictions based on simple co-occurrence rather than grasping inherent factors that lead to malicious behavior. This spurious bias would easily cause misjudgments. To address this problem, we propose a new debiased method based on causal inference. We first employ a set of features in multiple modalities to characterize the posts. Considering these features are often mixed up with unknown biases, we then disentangle three kinds of latent factors from them, including the invariant factor that indicates intrinsic bait intention; the causal factor which reflects deceptive patterns in a certain scenario, and non-causal noise. By eliminating the noise that causes bias, we can use invariant and causal factors to build a robust model with good generalization ability. Experiments on three popular datasets show the effectiveness of our approach.

Paper Structure

This paper contains 25 sections, 5 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: Clickbait samples. The purple arrow indicates inconsistencies between the headline and its linked article. The simple clickbait post contains conspicuous bait-indicative words or advertising content (marked with blue boxes) which is easily detected. The complex one disguises the bait content with some valid content (in red boxes) and makes it look inconspicuous, thus deceiving and escaping the detector.
  • Figure 2: The overview framework of our causal clickbait detector.
  • Figure 3: Causal structure for de-confounding biases. Gray and white nodes represent observable and unobserved variables, respectively; the bidirectional and unidirectional arrows denote correlations and causalities, respectively; purple arrows are key causalities determining result $Y$; and orange arrows refer to scenario effects.
  • Figure 4: PR curves of all models on three datasets.
  • Figure 5: Evaluation the impact of $\mathbf{m}$ settings on F1.
  • ...and 7 more figures