From Euler to Today: Universal Mathematical Fallibility A Large-Scale Computational Analysis of Errors in ArXiv Papers
Igor Rivin
TL;DR
The paper addresses the problem of undetected mathematical errors in published literature by applying automated computational review to over 37{,}000 ArXiv mathematics papers, producing complete referee-like reports with journal-tier recommendations. It reports field-specific error rates, notably $9.6\%$ in Numerical Analysis, $6.5\%$ in Geometric Topology, and $0\%$ in Category Theory (in the complete sample), and documents errors in historical works by Euler and Dirichlet, underscoring universal fallibility across centuries. The study also demonstrates the system's ability to construct counterexamples and to assign journal-tier recommendations (e.g., $0.4\%$ Top Generalist, $15.5\%$ Top Field-Leading) and argues that the approach is discipline-agnostic and extensible to other domains such as physics and computer science. Overall, the work provides a practical path toward automated verification at scale and highlights the potential for human-AI collaboration to improve the reliability of mathematical publishing.
Abstract
We present the results of a large-scale computational analysis of mathematical papers from the ArXiv repository, demonstrating a comprehensive system that not only detects mathematical errors but provides complete referee reports with journal tier recommendations. Our automated analysis system processed over 37,000 papers across multiple mathematical categories, revealing significant error rates and quality distributions. Remarkably, the system identified errors in papers spanning three centuries of mathematics, including works by Leonhard Euler (1707-1783) and Peter Gustav Lejeune Dirichlet (1805-1859), as well as contemporary Fields medalists. In Numerical Analysis (math.NA), we observed an error rate of 9.6\% (2,271 errors in 23,761 papers), while Geometric Topology (math.GT) showed 6.5\% (862 errors in 13,209 papers). Strikingly, Category Theory (math.CT) showed 0\% errors in 93 papers analyzed, with evidence suggesting these results are ``easier'' for automated analysis. Beyond error detection, the system evaluated papers for journal suitability, recommending 0.4\% for top generalist journals, 15.5\% for top field-specific journals, and categorizing the remainder across specialist venues. These findings demonstrate both the universality of mathematical error across all eras and the feasibility of automated comprehensive mathematical peer review at scale. This work demonstrates that the methodology, while applied here to mathematics, is discipline-agnostic and could be readily extended to physics, computer science, and other fields represented in the ArXiv repository.
