Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation

Pujan Paudel; Mohammad Hammas Saeed; Rebecca Auger; Chris Wells; Gianluca Stringhini

Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation

Pujan Paudel, Mohammad Hammas Saeed, Rebecca Auger, Chris Wells, Gianluca Stringhini

TL;DR

This work tackles contextual false positives in automated soft moderation by reframing stance detection as Contrastive Textual Deviation (CTD), which anchors stance to a consensus statement and uses contrastive supporting and refuting markers. CTD is bootstrapped with large language models and subsequently fine-tuned on a large, diverse triplet corpus, achieving strong cross-domain performance that surpasses traditional stance methods and prior baselines. When integrated as a post-retrieval filter in the Lambretta soft moderation system, CTD dramatically reduces contextual false positives from 20% to 2.1% while maintaining low false negatives, enabling more granular and reliable warning deployment on social media. The approach demonstrates robust generalization across platforms and topics (climate, health, politics) and offers practical pathways for scalable, context-aware moderation in real-world deployments.

Abstract

Automated soft moderation systems are unable to ascertain if a post supports or refutes a false claim, resulting in a large number of contextual false positives. This limits their effectiveness, for example undermining trust in health experts by adding warnings to their posts or resorting to vague warnings instead of granular fact-checks, which result in desensitizing users. In this paper, we propose to incorporate stance detection into existing automated soft-moderation pipelines, with the goal of ruling out contextual false positives and providing more precise recommendations for social media content that should receive warnings. We develop a textual deviation task called Contrastive Textual Deviation (CTD) and show that it outperforms existing stance detection approaches when applied to soft moderation.We then integrate CTD into the stateof-the-art system for automated soft moderation Lambretta, showing that our approach can reduce contextual false positives from 20% to 2.1%, providing another important building block towards deploying reliable automated soft moderation tools on social media.

Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation

TL;DR

Abstract

Paper Structure (16 sections, 7 figures, 9 tables)

This paper contains 16 sections, 7 figures, 9 tables.

Introduction
Datasets
Motivation: Existing stance detection methods fall short
Need for Granularity (R1)
Need for Claim Invariancy (R2)
Need for contrastive context awareness (R3)
Contrastive Textual Deviation
Task Definition
Bootstrapping CTD using LLMs
Evaluation
Fine tuning FLAN-T5 for CTD
Comparison with existing baselines
Model size and performance tradeoff
Integrating CTD into Lambretta
Related Work
...and 1 more sections

Figures (7)

Figure 1: Three tweets discussing the debunked claim that COVID-19 is caused by 5G. Existing moderation systems might suffer from topical false positives as well as contextual false positives.
Figure 2: Example claims from evaluation dataset and corresponding triplets.
Figure 3: T-SNE embeddings of supporting and refuting statements from COVID-CQ
Figure 4: Prompting structure for bootstrapping CTD.
Figure 5: Example claims and perspectives from PERSPECTRUM dataset.
...and 2 more figures

Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation

TL;DR

Abstract

Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)