Table of Contents
Fetching ...

An Interactive Framework for Profiling News Media Sources

Nikhil Mehta, Dan Goldwasser

TL;DR

The paper targets the challenge of profiling news media sources for fake news and political bias in the wild, especially during emerging events. It proposes an interactive framework that fuses graph-based community modeling, GPT-3 driven user summaries, and human validation to form information communities, expand them, and refine source profiling without relying on extensive labeled data. The approach demonstrates substantial improvements over baselines in fully inductive, event-based tests (e.g., Black Lives Matter and Abortion/Feminism), achieving performance gains with as few as one to a few human interactions. This work highlights the practical potential of human-in-the-loop, LLM-augmented graph methods for rapid, robust media profiling in dynamic social media landscapes, while foregrounding ethics, limitations, and the need for careful deployment.

Abstract

The recent rise of social media has led to the spread of large amounts of fake and biased news, content published with the intent to sway beliefs. While detecting and profiling the sources that spread this news is important to maintain a healthy society, it is challenging for automated systems. In this paper, we propose an interactive framework for news media profiling. It combines the strengths of graph based news media profiling models, Pre-trained Large Language Models, and human insight to characterize the social context on social media. Experimental results show that with as little as 5 human interactions, our framework can rapidly detect fake and biased news media, even in the most challenging settings of emerging news events, where test data is unseen.

An Interactive Framework for Profiling News Media Sources

TL;DR

The paper targets the challenge of profiling news media sources for fake news and political bias in the wild, especially during emerging events. It proposes an interactive framework that fuses graph-based community modeling, GPT-3 driven user summaries, and human validation to form information communities, expand them, and refine source profiling without relying on extensive labeled data. The approach demonstrates substantial improvements over baselines in fully inductive, event-based tests (e.g., Black Lives Matter and Abortion/Feminism), achieving performance gains with as few as one to a few human interactions. This work highlights the practical potential of human-in-the-loop, LLM-augmented graph methods for rapid, robust media profiling in dynamic social media landscapes, while foregrounding ethics, limitations, and the need for careful deployment.

Abstract

The recent rise of social media has led to the spread of large amounts of fake and biased news, content published with the intent to sway beliefs. While detecting and profiling the sources that spread this news is important to maintain a healthy society, it is challenging for automated systems. In this paper, we propose an interactive framework for news media profiling. It combines the strengths of graph based news media profiling models, Pre-trained Large Language Models, and human insight to characterize the social context on social media. Experimental results show that with as little as 5 human interactions, our framework can rapidly detect fake and biased news media, even in the most challenging settings of emerging news events, where test data is unseen.
Paper Structure (40 sections, 6 figures, 9 tables, 1 algorithm)

This paper contains 40 sections, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: Our framework overview: Using Trained Graph Models, Large Language Models (GPT-3) and Human Interaction to Form Information Communities for News Media Profiling. (Key: U = Users, A = Articles, S = Sources, Light Red Background = Candidate information Communities, Light Blue Background = Validated Information Communities). From the learned graph model (b), we find candidate information communities through k-means clustering. Using a LLM, GPT-3, we form a textual representation of the information community by summarizing its users, and then ask humans to narrow down the community based on the user summaries (d), forming smaller, validated communities, whose users are then connected to each other. We then expand the validated communities, by again model clustering users, forming user summaries, but this time asking GPT-3 to place or not palce the users into validated communities, which can be done repeatedly (e). This entire process (c-e) can repeat, starting with clustering of unassigned users (c), to form more validated communities, which can be expanded further.
  • Figure 2: An example of the prompt we used to determine the user summary. Based on their bio, meta-data, and tweets, we create a summary.
  • Figure 3: An example of the output shown by Chat-GPT when provided user summaries and asked to predict similarity. Note how often times the output can be vague, which is why human interactions are necessary.
  • Figure 4: An example of the prompt we used to determine community membership for one of the human validated information communities. We use the first paragraph as a 1-shot example, to prompt the model. User 1 and 2 are both critical of the Black Lives Matter movement protests, and thus part of the same community, while User 3 is in support of it, and thus shouldn't be in the community. Based on this, we prompt GPT-3 with additional users (in this case User 4, 5, and 6), and ask it to determine which users belong in the community and which do not.
  • Figure 5: LLM Failure Case: In this case, the LLM (Chat-GPT) can't find any communities, but it is clear that at least User 2 and User 3 should be in the same community, as they are both against the Black Lives Matter movement.
  • ...and 1 more figures