Table of Contents
Fetching ...

Is Less Really More? Fake News Detection with Limited Information

Zhaoyang Cao, John Nguyen, Reza Zafarani

TL;DR

This work addresses the challenge of fake news detection under information scarcity by introducing the SLIM framework, which systematically selects limited information signals (keywords, sequences, metadata) and quantifies their information content with information-theoretic measures. By leveraging XLNet-base and four input variations (keyword, sequence, metadata, multimodal), SLIM demonstrates that selective cues can match or closely approach full-text performance while markedly reducing data and compute requirements. Key findings show that 30% keyword usage yields near-parity with full-text accuracy on two benchmarks, and multimodal fusion further boosts performance, whereas metadata alone is insufficient. The approach offers a practical, efficiency-driven path for robust fake news detection in sparse-data environments and real-time applications, with broad implications for scalable, multimodal information analysis.

Abstract

The threat that online fake news and misinformation pose to democracy, justice, public confidence, and especially to vulnerable populations, has led to a sharp increase in the need for fake news detection and intervention. Whether multi-modal or pure text-based, most fake news detection methods depend on textual analysis of entire articles. However, these fake news detection methods come with certain limitations. For instance, fake news detection methods that rely on full text can be computationally inefficient, demand large amounts of training data to achieve competitive accuracy, and may lack robustness across different datasets. This is because fake news datasets have strong variations in terms of the level and types of information they provide; where some can include large paragraphs of text with images and metadata, others can be a few short sentences. Perhaps if one could only use minimal information to detect fake news, fake news detection methods could become more robust and resilient to the lack of information. We aim to overcome these limitations by detecting fake news using systematically selected, limited information that is both effective and capable of delivering robust, promising performance. We propose a framework called SLIM Systematically-selected Limited Information) for fake news detection. In SLIM, we quantify the amount of information by introducing information-theoretic measures. SLIM leverages limited information to achieve performance in fake news detection comparable to that of state-of-the-art obtained using the full text. Furthermore, by combining various types of limited information, SLIM can perform even better while significantly reducing the quantity of information required for training compared to state-of-the-art language model-based fake news detection techniques.

Is Less Really More? Fake News Detection with Limited Information

TL;DR

This work addresses the challenge of fake news detection under information scarcity by introducing the SLIM framework, which systematically selects limited information signals (keywords, sequences, metadata) and quantifies their information content with information-theoretic measures. By leveraging XLNet-base and four input variations (keyword, sequence, metadata, multimodal), SLIM demonstrates that selective cues can match or closely approach full-text performance while markedly reducing data and compute requirements. Key findings show that 30% keyword usage yields near-parity with full-text accuracy on two benchmarks, and multimodal fusion further boosts performance, whereas metadata alone is insufficient. The approach offers a practical, efficiency-driven path for robust fake news detection in sparse-data environments and real-time applications, with broad implications for scalable, multimodal information analysis.

Abstract

The threat that online fake news and misinformation pose to democracy, justice, public confidence, and especially to vulnerable populations, has led to a sharp increase in the need for fake news detection and intervention. Whether multi-modal or pure text-based, most fake news detection methods depend on textual analysis of entire articles. However, these fake news detection methods come with certain limitations. For instance, fake news detection methods that rely on full text can be computationally inefficient, demand large amounts of training data to achieve competitive accuracy, and may lack robustness across different datasets. This is because fake news datasets have strong variations in terms of the level and types of information they provide; where some can include large paragraphs of text with images and metadata, others can be a few short sentences. Perhaps if one could only use minimal information to detect fake news, fake news detection methods could become more robust and resilient to the lack of information. We aim to overcome these limitations by detecting fake news using systematically selected, limited information that is both effective and capable of delivering robust, promising performance. We propose a framework called SLIM Systematically-selected Limited Information) for fake news detection. In SLIM, we quantify the amount of information by introducing information-theoretic measures. SLIM leverages limited information to achieve performance in fake news detection comparable to that of state-of-the-art obtained using the full text. Furthermore, by combining various types of limited information, SLIM can perform even better while significantly reducing the quantity of information required for training compared to state-of-the-art language model-based fake news detection techniques.

Paper Structure

This paper contains 31 sections, 14 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Extracting Keyword Information: the input is the body of the news under the proposed SLIM framework
  • Figure 2: Representation of information density by average normalized Shannon entropy (a) and the average count of tokens (b) on the ReCOVery dataset
  • Figure 3: Performance comparison of datasets of the $\textsf{SLIM}_{\textsc{keyword}}$ frameworks. All datasets achieve an accuracy ratio of over 96% when we extract 30% of the keywords, among which the ReCOVery datasets showed an approximately 99% accuracy ratio.
  • Figure 4: Performance comparison of datasets of the $\textsf{SLIM}_{\textsc{sequence}}$ frameworks in POS tagging words.The percentage of POS tagging words (primarily adjectives and adverbs) that can be extracted from the full text is approximately 10% to 20%. However, using a small number of POS tagging words can achieve an accuracy ratio of 94%.
  • Figure 5: Performance comparison of datasets of the $\textsf{SLIM}_{\textsc{multimodal}}$ frameworks. Generally, the integration of different types of limited information improves fake news detection accuracy compared to using only keywords ($\textsf{SLIM}_{\textsc{keyword}}$). In the Fake_And_Real_News dataset, the performance of keywords and NER words shows an approximately 0.5% decline compared to using only keywords.
  • ...and 1 more figures