Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation

Tyler Vergho; Jean-Francois Godbout; Reihaneh Rabbany; Kellin Pelrine

Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation

Tyler Vergho, Jean-Francois Godbout, Reihaneh Rabbany, Kellin Pelrine

TL;DR

The paper analyzes misinformation-detection performance across proprietary GPT-4 and open-source models, highlighting Zephyr-7b as a strong, consistently viable open-source alternative that narrows the gap with GPT-4 on key datasets such as LIAR and LIAR-New. It introduces structured JSON output and function calling to enable reliable parsing and integration into downstream systems. Results show Zephyr-7b approaches GPT-4 on several benchmarks, while GPT-3.5 exhibits high sensitivity to prompt wording, and GPT-4 updates offer incremental stability rather than dramatic gains. These findings support the growing viability of open-source LLMs for misinformation mitigation and advocate for broader adoption of structured-output techniques in MLOps and research pipelines.

Abstract

Recent large language models (LLMs) have been shown to be effective for misinformation detection. However, the choice of LLMs for experiments varies widely, leading to uncertain conclusions. In particular, GPT-4 is known to be strong in this domain, but it is closed source, potentially expensive, and can show instability between different versions. Meanwhile, alternative LLMs have given mixed results. In this work, we show that Zephyr-7b presents a consistently viable alternative, overcoming key limitations of commonly used approaches like Llama-2 and GPT-3.5. This provides the research community with a solid open-source option and shows open-source models are gradually catching up on this task. We then highlight how GPT-3.5 exhibits unstable performance, such that this very widely used model could provide misleading results in misinformation detection. Finally, we validate new tools including approaches to structured output and the latest version of GPT-4 (Turbo), showing they do not compromise performance, thus unlocking them for future research and potentially enabling more complex pipelines for misinformation mitigation.

Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation

TL;DR

Abstract

Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation

Authors

TL;DR

Abstract

Table of Contents