Truth in Text: A Meta-Analysis of ML-Based Cyber Information Influence Detection Approaches
Jason M. Pittman
TL;DR
This paper addresses the effectiveness of ML-based cyber information influence detection (misinformation, disinformation, fake news) by conducting a three-stage meta-analysis across 40 peer-reviewed studies, compiling 153 accuracy values from 81 model types and 37 datasets. It finds a pooled unweighted mean accuracy of approximately $78.19\%$ with substantial within-model variance and no statistically significant differences between major ML approach families (ANOVA $p\approx0.997$). The results highlight persistent fragmentation in datasets and reporting practices, limiting universal conclusions about the best technique and underscoring the need for replication, standardization, and open resources. The work advocates deeper, subgroup-focused analyses, standardized benchmarks, and consideration of adversarial robustness to advance practical, transparent misinformation detection systems.
Abstract
Cyber information influence, or disinformation in general terms, is widely regarded as one of the biggest threats to social progress and government stability. From US presidential elections to European Union referendums and down to regional news reporting of wildfires, lies and post-truths have normalized radical decision-making. Accordingly, there has been an explosion in research seeking to detect disinformation in online media. The frontier of disinformation detection research is leveraging a variety of ML techniques such as traditional ML algorithms like Support Vector Machines, Random Forest, and Naïve Bayes. Other research has applied deep learning models including Convolutional Neural Networks, Long Short-Term Memory networks, and transformer-based architectures. Despite the overall success of such techniques, the literature demonstrates inconsistencies when viewed holistically which limits our understanding of the true effectiveness. Accordingly, this work employed a two-stage meta-analysis to (a) demonstrate an overall meta statistic for ML model effectiveness in detecting disinformation and (b) investigate the same by subgroups of ML model types. The study found the majority of the 81 ML detection techniques sampled have greater than an 80\% accuracy with a Mean sample effectiveness of 79.18\% accuracy. Meanwhile, subgroups demonstrated no statistically significant difference between-approaches but revealed high within-group variance. Based on the results, this work recommends future work in replication and development of detection methods operating at the ML model level.
