Establishing trust in automated reasoning
Konrad Hinsen
TL;DR
This paper argues that independent reviewing of automated reasoning systems is essential to establish trust in computational science, especially amid the reproducibility crisis. It introduces five dimensions that shape the reviewability of software, analyzes representative case studies (NumPy, GSL, GROMACS) to illustrate strengths and limitations, and advocates a framework of four measures to improve reliability: review the reviewable, align with science rather than industry, emphasize situated and convivial software, and make software explainable using Digital Scientific Notations. It also discusses the need for institutional support, stable infrastructure, and new practices to manage dependencies and environments, highlighting the potential for formalized specifications and narrative-code tools to enhance transparency. The practical significance lies in reducing wasted effort and increasing epistemic diversity by enabling robust, context-aware assessments of software reliability in scientific workflows.
Abstract
Since its beginnings in the 1940s, automated reasoning by computers has become a tool of ever growing importance in scientific research. So far, the rules underlying automated reasoning have mainly been formulated by humans, in the form of program source code. Rules derived from large amounts of data, via machine learning techniques, are a complementary approach currently under intense development. The question of why we should trust these systems, and the results obtained with their help, has been discussed by philosophers of science but has so far received little attention by practitioners. The present work focuses on independent reviewing, an important source of trust in science, and identifies the characteristics of automated reasoning systems that affect their reviewability. It also discusses possible steps towards increasing reviewability and trustworthiness via a combination of technical and social measures.
