Table of Contents
Fetching ...

Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines

Md Main Uddin Rony, Md Mahfuzul Haque, Mohammad Ali, Ahmed Shatil Alam, Naeemul Hassan

TL;DR

This study investigates how large language models (LLMs) can identify misleading news headlines. Using 60 articles across health, science & tech, and business domains, it compares ChatGPT-3.5, ChatGPT-4, and Gemini, finding ChatGPT-4 to be the most accurate, especially when human annotators unanimously agree on misleading headlines. The results reveal that model performance strongly depends on consensus level, with notable gaps in mixed-consensus scenarios, underscoring the need for human-centered evaluation and ethical alignment in AI-driven misinformation detection. The work highlights practical implications for journalists, developers, and policymakers and suggests future directions toward explainable, ethically aware, and multimodal misinformation detection systems.

Abstract

In the digital age, the prevalence of misleading news headlines poses a significant challenge to information integrity, necessitating robust detection mechanisms. This study explores the efficacy of Large Language Models (LLMs) in identifying misleading versus non-misleading news headlines. Utilizing a dataset of 60 articles, sourced from both reputable and questionable outlets across health, science & tech, and business domains, we employ three LLMs- ChatGPT-3.5, ChatGPT-4, and Gemini-for classification. Our analysis reveals significant variance in model performance, with ChatGPT-4 demonstrating superior accuracy, especially in cases with unanimous annotator agreement on misleading headlines. The study emphasizes the importance of human-centered evaluation in developing LLMs that can navigate the complexities of misinformation detection, aligning technical proficiency with nuanced human judgment. Our findings contribute to the discourse on AI ethics, emphasizing the need for models that are not only technically advanced but also ethically aligned and sensitive to the subtleties of human interpretation.

Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines

TL;DR

This study investigates how large language models (LLMs) can identify misleading news headlines. Using 60 articles across health, science & tech, and business domains, it compares ChatGPT-3.5, ChatGPT-4, and Gemini, finding ChatGPT-4 to be the most accurate, especially when human annotators unanimously agree on misleading headlines. The results reveal that model performance strongly depends on consensus level, with notable gaps in mixed-consensus scenarios, underscoring the need for human-centered evaluation and ethical alignment in AI-driven misinformation detection. The work highlights practical implications for journalists, developers, and policymakers and suggests future directions toward explainable, ethically aware, and multimodal misinformation detection systems.

Abstract

In the digital age, the prevalence of misleading news headlines poses a significant challenge to information integrity, necessitating robust detection mechanisms. This study explores the efficacy of Large Language Models (LLMs) in identifying misleading versus non-misleading news headlines. Utilizing a dataset of 60 articles, sourced from both reputable and questionable outlets across health, science & tech, and business domains, we employ three LLMs- ChatGPT-3.5, ChatGPT-4, and Gemini-for classification. Our analysis reveals significant variance in model performance, with ChatGPT-4 demonstrating superior accuracy, especially in cases with unanimous annotator agreement on misleading headlines. The study emphasizes the importance of human-centered evaluation in developing LLMs that can navigate the complexities of misinformation detection, aligning technical proficiency with nuanced human judgment. Our findings contribute to the discourse on AI ethics, emphasizing the need for models that are not only technically advanced but also ethically aligned and sensitive to the subtleties of human interpretation.
Paper Structure (19 sections, 2 tables)