Generalization Gaps in Political Fake News Detection: An Empirical Study on the LIAR Dataset
S Mahmudul Hasan, Shaily Roy, Akib Jawad Nafis
TL;DR
Political fake-news detection based on text alone suffers from limited discriminative signals in short political statements, as shown on the LIAR dataset. The authors perform a broad diagnostic benchmark across nine algorithms using lexical (BoW/TF-IDF) and semantic (GloVe) representations, revealing a consistent performance ceiling (~0.32 Weighted F1 for fine-grained, ~0.64 for binary) and a large generalization gap, with high-capacity models memorizing training data. A SMOTE augmentation fails to improve results, indicating that the bottleneck is semantic ambiguity rather than distributional imbalance. The study concludes that gains from increasing model complexity are limited without external knowledge, suggesting future work should integrate external evidence, knowledge sources, or multi-modal signals for robust political fact-checking.
Abstract
The proliferation of linguistically subtle political disinformation poses a significant challenge to automated fact-checking systems. Despite increasing emphasis on complex neural architectures, the empirical limits of text-only linguistic modeling remain underexplored. We present a systematic diagnostic evaluation of nine machine learning algorithms on the LIAR benchmark. By isolating lexical features (Bag-of-Words, TF-IDF) and semantic embeddings (GloVe), we uncover a hard "Performance Ceiling", with fine-grained classification not exceeding a Weighted F1-score of 0.32 across models. Crucially, a simple linear SVM (Accuracy: 0.624) matches the performance of pre-trained Transformers such as RoBERTa (Accuracy: 0.620), suggesting that model capacity is not the primary bottleneck. We further diagnose a massive "Generalization Gap" in tree-based ensembles, which achieve more than 99% training accuracy but collapse to approximately 25% on test data, indicating reliance on lexical memorization rather than semantic inference. Synthetic data augmentation via SMOTE yields no meaningful gains, confirming that the limitation is semantic (feature ambiguity) rather than distributional. These findings indicate that for political fact-checking, increasing model complexity without incorporating external knowledge yields diminishing returns.
