Table of Contents
Fetching ...

Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models

Qiong Nan, Qiang Sheng, Juan Cao, Beizhe Hu, Danding Wang, Jintao Li

TL;DR

This paper proposes to adopt large language models as a user simulator and comment generator, and designs GenFEND, a generated feedback-enhanced detection framework, which generates comments by prompting LLMs with diverse user profiles and aggregating generated comments from multiple subpopulation groups.

Abstract

Fake news detection plays a crucial role in protecting social media users and maintaining a healthy news ecosystem. Among existing works, comment-based fake news detection methods are empirically shown as promising because comments could reflect users' opinions, stances, and emotions and deepen models' understanding of fake news. Unfortunately, due to exposure bias and users' different willingness to comment, it is not easy to obtain diverse comments in reality, especially for early detection scenarios. Without obtaining the comments from the ``silent'' users, the perceived opinions may be incomplete, subsequently affecting news veracity judgment. In this paper, we explore the possibility of finding an alternative source of comments to guarantee the availability of diverse comments, especially those from silent users. Specifically, we propose to adopt large language models (LLMs) as a user simulator and comment generator, and design GenFEND, a generated feedback-enhanced detection framework, which generates comments by prompting LLMs with diverse user profiles and aggregating generated comments from multiple subpopulation groups. Experiments demonstrate the effectiveness of GenFEND and further analysis shows that the generated comments cover more diverse users and could even be more effective than actual comments.

Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models

TL;DR

This paper proposes to adopt large language models as a user simulator and comment generator, and designs GenFEND, a generated feedback-enhanced detection framework, which generates comments by prompting LLMs with diverse user profiles and aggregating generated comments from multiple subpopulation groups.

Abstract

Fake news detection plays a crucial role in protecting social media users and maintaining a healthy news ecosystem. Among existing works, comment-based fake news detection methods are empirically shown as promising because comments could reflect users' opinions, stances, and emotions and deepen models' understanding of fake news. Unfortunately, due to exposure bias and users' different willingness to comment, it is not easy to obtain diverse comments in reality, especially for early detection scenarios. Without obtaining the comments from the ``silent'' users, the perceived opinions may be incomplete, subsequently affecting news veracity judgment. In this paper, we explore the possibility of finding an alternative source of comments to guarantee the availability of diverse comments, especially those from silent users. Specifically, we propose to adopt large language models (LLMs) as a user simulator and comment generator, and design GenFEND, a generated feedback-enhanced detection framework, which generates comments by prompting LLMs with diverse user profiles and aggregating generated comments from multiple subpopulation groups. Experiments demonstrate the effectiveness of GenFEND and further analysis shows that the generated comments cover more diverse users and could even be more effective than actual comments.
Paper Structure (19 sections, 9 equations, 5 figures, 8 tables)

This paper contains 19 sections, 9 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Existing fake news detection methods rely on (a) the news content itself and (b) limited comments from actively commenting users only. Unlike (a) and (b), our GenFEND uses (c) diverse comments generated by large language models from both potentially active and silent simulated users.
  • Figure 2: Overview of Generated Feedback Enhanced Detection (GenFEND) framework. (a) Multi-View Comment Generation: Pre-define different user profiles with three demographic characteristics (gender, age, and education); Then, prompt the LLM to generate comments by role-playing these users. (b) Multi-Subpopulation Feedback Understanding: Split generated comments into different subpopulation groups for each view; Extract the semantic feature $\boldsymbol{s}^{mean}_p$ for each subpopulation group $p$ and diversity representation $\boldsymbol{d}^\mathcal{V}$ for each view $\mathcal{V}$; (c) Aggregation and Classification: Perform intra-view aggregation by operating dot-product between semantic features $\{\boldsymbol{s}^{mean}_p\}_{p \in \left\{1, ..., m_\mathcal{V}\right\}}$ in each view $\mathcal{V}$ and news feature $\boldsymbol{e}^o$; Perform inter-view aggregation to get final feature $\boldsymbol{r}$ of generated comments with a fusion gate guided by news feature $\boldsymbol{e}^o$ and diversity representation $\boldsymbol{d} = \oplus_{\mathcal{V} \in \left\{\mathcal{G}, \mathcal{A}, \mathcal{E}\right\}}\boldsymbol{d}^\mathcal{V}$ as input; Concatenate $\boldsymbol{r}$ and $\boldsymbol{e}^o$ (and $\boldsymbol{e}^c_{actual}$ if available) for classification.
  • Figure 3: Early detection performance of dEFEND and dEFEND w/ GenFEND with 1, 2, 4, 8, and 16 actual comments for the testing data.
  • Figure 4: Macro F1 scores of BERT w/ GenFEND with 30 generated comments from different numbers of users (30/15/10). Each user generates the same number of comments.
  • Figure 5: Average conformity scores of generated comments to each attribute. The bars are in "male; female" order for gender, from young to old for age, and from low-level to high-level degree for education.