Table of Contents
Fetching ...

Can We Mathematically Spot Possible Manipulation of Results in Research Manuscripts Using Benford's Law?

Teddy Lazebnik, Dan Gorlitsky

TL;DR

The findings show that Benford’s law adapted for aggregated data, can be an initial tool for identifying data manipulation; however, it is not a silver bullet, requiring further investigation for each flagged manuscript due to the relatively low prediction accuracy.

Abstract

The reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. What is even more concerning is the increasing number of false claims found in academic manuscripts recently, casting doubt on the validity of reported results. In this paper, we utilize an adaptive version of Benford's law, a statistical phenomenon that describes the distribution of leading digits in naturally occurring datasets, to identify potential manipulation of results in research manuscripts, solely using the aggregated data presented in those manuscripts. Our methodology applies the principles of Benford's law to commonly employed analyses in academic manuscripts, thus, reducing the need for the raw data itself. To validate our approach, we employed 100 open-source datasets and successfully predicted 79% of them accurately using our rules. Additionally, we analyzed 100 manuscripts published in the last two years across ten prominent economic journals, with ten manuscripts randomly sampled from each journal. Our analysis predicted a 3% occurrence of result manipulation with a 96% confidence level. Our findings uncover disturbing inconsistencies in recent studies and offer a semi-automatic method for their detection.

Can We Mathematically Spot Possible Manipulation of Results in Research Manuscripts Using Benford's Law?

TL;DR

The findings show that Benford’s law adapted for aggregated data, can be an initial tool for identifying data manipulation; however, it is not a silver bullet, requiring further investigation for each flagged manuscript due to the relatively low prediction accuracy.

Abstract

The reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. What is even more concerning is the increasing number of false claims found in academic manuscripts recently, casting doubt on the validity of reported results. In this paper, we utilize an adaptive version of Benford's law, a statistical phenomenon that describes the distribution of leading digits in naturally occurring datasets, to identify potential manipulation of results in research manuscripts, solely using the aggregated data presented in those manuscripts. Our methodology applies the principles of Benford's law to commonly employed analyses in academic manuscripts, thus, reducing the need for the raw data itself. To validate our approach, we employed 100 open-source datasets and successfully predicted 79% of them accurately using our rules. Additionally, we analyzed 100 manuscripts published in the last two years across ten prominent economic journals, with ten manuscripts randomly sampled from each journal. Our analysis predicted a 3% occurrence of result manipulation with a 96% confidence level. Our findings uncover disturbing inconsistencies in recent studies and offer a semi-automatic method for their detection.
Paper Structure (5 sections, 1 equation, 1 figure, 2 tables)

This paper contains 5 sections, 1 equation, 1 figure, 2 tables.

Figures (1)

  • Figure 1: A schematic view of this study. First, we outline the mathematical framework based on Benford's theory. Next, we outline the data acquisition process for the experiments. Finally, we present the experimental setup and results, including a method validation experiment and an evaluation of recent economics studies, followed by an analysis of the results and a discussion about their implementations.