Correcting misinformation on social media with a large language model

Xinyi Zhou; Ashish Sharma; Amy X. Zhang; Tim Althoff

Correcting misinformation on social media with a large language model

Xinyi Zhou, Ashish Sharma, Amy X. Zhang, Tim Althoff

TL;DR

Addressing misinformation on social media, the paper presents Muse, a retrieval-augmented, multimodal system that identifies inaccurate parts of content and explains why with grounded references. Muse comprises three components: a response generator built on an LLM, a hierarchical credibility-aware web retriever, and a multimodal integrator that converts images into text descriptions for evidence retrieval. In expert evaluations across 464 posts, Muse's overall response quality averaged 8.1/10, outperforming GPT-4 by 37% and high-helpfulness layperson responses by 29%. End-user perception study (n=988) shows Muse corrections raise the correct belief that misinformation is misleading by 9.8%, with a per-post cost around $0.5 at the time, and the approach generalizes across modalities, domains, and political leanings; limitations include no video input, English-only assessment on X Community Notes, and reliance on credible-source retrieval.

Abstract

Real-world information, often multimodal, can be misinformed or potentially misleading due to factual errors, outdated claims, missing context, misinterpretation, and more. Such "misinformation" is understudied, challenging to address, and harms many social domains -- particularly on social media, where it can spread rapidly. Manual correction that identifies and explains its (in)accuracies is widely accepted but difficult to scale. While large language models (LLMs) can generate human-like language that could accelerate misinformation correction, they struggle with outdated information, hallucinations, and limited multimodal capabilities. We propose MUSE, an LLM augmented with vision-language modeling and web retrieval over relevant, credible sources to generate responses that determine whether and which part(s) of the given content can be misinformed or potentially misleading, and to explain why with grounded references. We further define a comprehensive set of rubrics to measure response quality, ranging from the accuracy of identifications and factuality of explanations to the relevance and credibility of references. Results show that MUSE consistently produces high-quality outputs across diverse social media content (e.g., modalities, domains, political leanings), including content that has not previously been fact-checked online. Overall, MUSE outperforms GPT-4 by 37% and even high-quality responses from social media users by 29%. Our work provides a general methodological and evaluative framework for correcting misinformation at scale.

Correcting misinformation on social media with a large language model

TL;DR

Abstract

Paper Structure (1 section, 31 figures)

This paper contains 1 section, 31 figures.

Keywords

Figures (31)

Figure 1: Overview of Muse, an LLM augmented by addressing images and accessing timely knowledge from credible publishers to enable identifying and explaining (in)accuracies in a piece of multimodal content with accurate and trustworthy references. Given a piece of content that may or may not be misinformation, Muse searches for related and credible web pages, from which extracts evidence as refutations or contexts. Using the evidence, Muse generates a response identifying and explaining the (in)accuracies within the input content. a: Informative image captioning. Muse augments image captioning models with celebrity and optical character recognition to generate informative descriptions of images. b: Retrieval of related web pages. Muse retrieves web pages using LLM-generated queries and a web search engine and filters them based on their multimodal relevance to the given content. c: Credibility evaluation of the publishers of web pages. d: Evidence-assisted response generation. Muse filters and ranks publishers based on their professionally rated factuality and bias. It starts from the web pages with the highest factuality and least bias and leverages an LLM to extract evidence refuting or contextualizing the given content. It continues down the ranking, stopping when it has obtained sufficient refutations (i.e., at least two pages were found to refute the misinformation) or gone through all the credible pages. Finally, it generates a response by providing an LLM with the extracted evidence. Besides identifying and correcting a false post shown here, Muse can also identify and respond to accurate, partially accurate, and factually accurate but potentially misleading (see examples in Supp. Fig. \ref{['supp:figure:example_muse_responses']}).
Figure 2: Results of expert evaluation ($p<2\times10^{-5}$ for each approach pair respectively in a-n by Mann-Whitney U test; experiments=84). a: The overall quality of Muse-generated responses ($\text{mean}\pm\text{SD}$: $8.1\pm2.0$; $n=232$) is 29% higher than laypeople's high-helpfulness responses ($6.3\pm 2.0$; 232), 37% higher than GPT-4-generated responses ($5.9\pm 2.7$; 232), and 56% higher than laypeople's average-helpfulness responses ($5.2\pm 2.1$; 230). b-f: The quality of identifying and explaining inaccuracies. Muse-generated responses more explicitly identify and explain inaccuracies (b), more comprehensively identify inaccuracies with fewer mistakes that falsely state an accurate claim as inaccurate or an inaccurate claim as accurate (c-d), and more accurately and informatively explain inaccuracies (e-f) than GPT-4-generated and laypeople's high- and average-helpfulness responses. g-k: The quality of generated text. Muse's generated text is more relevant to the responded misinformation and factual than GPT-4's generated text and the text of high- and average-helpfulness responses by laypeople (g-h). Muse-generated text is more fluent and coherent than the text of high-helpfulness responses by laypeople and additionally less toxic than the text of average-helpfulness responses by laypeople (i-k). l-n: The quality of links as references. Muse rarely while GPT-4 frequently hallucinates references; Muse provides significantly more reachable links that are relevant to the generated text (l-m). Muse's references are more credible than the references offered in high- and average-helpfulness responses by laypeople (n). Note that laypeople's responses were created on average 14 hours after the social media post. Here, Muse only retrieved web pages published before the post (Methods).
Figure 3: Quality of responses to social media posts across modalities, fact-checking statuses, political divides, domains, and tactics used to make them or part of them false or misleading. a: Muse consistently outperforms GPT-4 and laypeople who produce even high-helpfulness responses by at least 21% when responding to textual content (n=155) and multimodal content (n=77). b: Muse outperforms GPT-4 and laypeople who produce even high-helpfulness responses by at least 28% even when responding to content that has not been fact-checked online (n=195). c: Muse consistently outperforms GPT-4 and laypeople who produce even high-helpfulness responses by at least 26% when responding to liberal content (n=110) and conservative content (n=50). d: Muse consistently outperforms GPT-4 and laypeople who produce even high-helpfulness responses by at least 25% when responding to content about politics and international affairs (n=80), economy and business (n=38), crime and law (n=38), social issues and human rights (n=30), and health and medicine (n=24). e: Muse consistently outperforms GPT-4 and laypeople who produce even high-helpfulness responses by at least 19% when responding to misinformation that includes misinterpretations or misrepresentations (n=62), false or oversimplified causation (n=52), lack of context (n=35), fabrications (n=35), loaded language (n=30), false or biased data (n=29), and improper analogies or equivalences (n=19). Note that laypeople's responses were created on average 14 hours after the social media post. Here, Muse only retrieved web pages published before the post (Methods).
Figure S1: Distribution of potential misinformation in X Community Notes (as of February 2023) that received its first response (gray) or first high-quality response (orange) within a certain amount of time.
Figure S2: Examples that show that generating multiple queries helps decompose a post, which may have multiple claims that each needs to be verified, whereas generating one query may overlook some of them and hence lead to not comprehensive identifications of (in)accuracies. Bold text: the verification-needed claims that are overlooked when generating one query but captured when generating more than one query.
...and 26 more figures

Correcting misinformation on social media with a large language model

TL;DR

Abstract

Correcting misinformation on social media with a large language model

Authors

TL;DR

Abstract

Table of Contents

Figures (31)