Table of Contents
Fetching ...

Collaborative Evolution: Multi-Round Learning Between Large and Small Language Models for Emergent Fake News Detection

Ziyi Zhou, Xiaoming Zhang, Shenghan Tan, Litian Zhang, Chaozhuo Li

TL;DR

The paper tackles emergent fake news detection under distribution shift and annotation scarcity. It introduces MRCD, a collaborative framework where an LLM provides demonstrations and external knowledge to an SLM, while a two-stage retrieval module curates relevant, up-to-date examples and knowledge; a data-selection scheme and multi-round learning iteratively refine labels and model capabilities. The approach achieves state-of-the-art performance on real-world datasets (Pheme and Twitter16), with substantial accuracy gains over purely SLM-based baselines and robust ablations validating each component. The work demonstrates that integrating retrieval-augmented demonstrations, external knowledge, and iterative self-labeling enables reliable detection of evolving misinformation, with practical implications for scalable fake news monitoring.

Abstract

The proliferation of fake news on social media platforms has exerted a substantial influence on society, leading to discernible impacts and deleterious consequences. Conventional deep learning methodologies employing small language models (SLMs) suffer from the necessity for extensive supervised training and the challenge of adapting to rapidly evolving circumstances. Large language models (LLMs), despite their robust zero-shot capabilities, have fallen short in effectively identifying fake news due to a lack of pertinent demonstrations and the dynamic nature of knowledge. In this paper, a novel framework Multi-Round Collaboration Detection (MRCD) is proposed to address these aforementioned limitations. The MRCD framework is capable of enjoying the merits from both LLMs and SLMs by integrating their generalization abilities and specialized functionalities, respectively. Our approach features a two-stage retrieval module that selects relevant and up-to-date demonstrations and knowledge, enhancing in-context learning for better detection of emerging news events. We further design a multi-round learning framework to ensure more reliable detection results. Our framework MRCD achieves SOTA results on two real-world datasets Pheme and Twitter16, with accuracy improvements of 7.4\% and 12.8\% compared to using only SLMs, which effectively addresses the limitations of current models and improves the detection of emergent fake news.

Collaborative Evolution: Multi-Round Learning Between Large and Small Language Models for Emergent Fake News Detection

TL;DR

The paper tackles emergent fake news detection under distribution shift and annotation scarcity. It introduces MRCD, a collaborative framework where an LLM provides demonstrations and external knowledge to an SLM, while a two-stage retrieval module curates relevant, up-to-date examples and knowledge; a data-selection scheme and multi-round learning iteratively refine labels and model capabilities. The approach achieves state-of-the-art performance on real-world datasets (Pheme and Twitter16), with substantial accuracy gains over purely SLM-based baselines and robust ablations validating each component. The work demonstrates that integrating retrieval-augmented demonstrations, external knowledge, and iterative self-labeling enables reliable detection of evolving misinformation, with practical implications for scalable fake news monitoring.

Abstract

The proliferation of fake news on social media platforms has exerted a substantial influence on society, leading to discernible impacts and deleterious consequences. Conventional deep learning methodologies employing small language models (SLMs) suffer from the necessity for extensive supervised training and the challenge of adapting to rapidly evolving circumstances. Large language models (LLMs), despite their robust zero-shot capabilities, have fallen short in effectively identifying fake news due to a lack of pertinent demonstrations and the dynamic nature of knowledge. In this paper, a novel framework Multi-Round Collaboration Detection (MRCD) is proposed to address these aforementioned limitations. The MRCD framework is capable of enjoying the merits from both LLMs and SLMs by integrating their generalization abilities and specialized functionalities, respectively. Our approach features a two-stage retrieval module that selects relevant and up-to-date demonstrations and knowledge, enhancing in-context learning for better detection of emerging news events. We further design a multi-round learning framework to ensure more reliable detection results. Our framework MRCD achieves SOTA results on two real-world datasets Pheme and Twitter16, with accuracy improvements of 7.4\% and 12.8\% compared to using only SLMs, which effectively addresses the limitations of current models and improves the detection of emergent fake news.

Paper Structure

This paper contains 30 sections, 6 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: (a) illustrates two major challenges for emergent fake news detection, the distribution shift and lack of annotations. (b) firstly demonstrates that SLMs perform well on testing past events after training, but its judgment capability significantly decreases on emergent events. It then shows that LLM performs poorly on emergent events directly in zero-shot scenarios. However, its detection effectiveness improves after using annotated data for few-shot learning and retrieval-augmented methods. The two experiments are conducted on Twitter16 dataset with RoBERTa as the SLM and Llama3-8B as the LLM.
  • Figure 2: The architecture of MRCD. The $\rightarrow$ denotes the first round of learning while the $\dashrightarrow$ denotes the process for the subsequent rounds of learning.
  • Figure 3: Hyper-parameter sensitivity analysis of $\mathcal{N}$ and $\omega$.
  • Figure 4: A case study of how MRCD works.