Measuring Falseness in News Articles based on Concealment and Overstatement
Jiyoung Lee, Keeheon Lee
TL;DR
This work tackles the problem of measuring falseness in real-world journalism by defining two lexical indicators, Concealment and Overstatement, and grounding them against a full-story reference from a Korean fact-checking resource. It assembles a matched dataset of 43 false and 43 real Korean articles across science, politics, and civics, processed with KoNLPy/Mecab, and analyzes the data with regression, nonparametric tests, and classifiers. Key findings show that false news tends to conceal information and overstate content more than real news, with $R^2$ values of $R^2 = 0.2624$ for real and $R^2 = 0.0171$ for false, and significant differences in both indicators ($p = 1.523 \times 10^{-11}$ for concealment, $p = 3.945 \times 10^{-8}$ for overstatement). The approach yields high discriminative performance (up to 0.92 accuracy) and provides a practical framework for early detection of partial falsity, supporting readers, journalists, and fact-checkers in fostering a resilient information environment; it also sets the stage for multilingual studies of falseness using aligned real/false content. $R^2$ and p-values are reported where applicable to quantify the strength of the observed effects.
Abstract
This research investigates the extent of misinformation in certain journalistic articles by introducing a novel measurement tool to assess the degrees of falsity. It aims to measure misinformation using two metrics (concealment and overstatement) to explore how information is interpreted as false. This should help examine how articles containing partly true and partly false information can potentially harm readers, as they are more challenging to identify than completely fabricated information. In this study, the full story provided by the fact-checking website serves as a standardized source of information for comparing differences between fake and real news. The result suggests that false news has greater concealment and overstatement, due to longer and more complex new stories being shortened and ambiguously phrased. While there are no major distinctions among categories of politics science and civics, it demonstrates that misinformation lacks crucial details while simultaneously containing more redundant words. Hence, news articles containing partial falsity, categorized as misinformation, can deceive inattentive readers who lack background knowledge. Hopefully, this approach instigates future fact-checkers, journalists, and the readers to secure high quality articles for a resilient information environment.
