Table of Contents
Fetching ...

Establishing trust in automated reasoning

Konrad Hinsen

TL;DR

This paper argues that independent reviewing of automated reasoning systems is essential to establish trust in computational science, especially amid the reproducibility crisis. It introduces five dimensions that shape the reviewability of software, analyzes representative case studies (NumPy, GSL, GROMACS) to illustrate strengths and limitations, and advocates a framework of four measures to improve reliability: review the reviewable, align with science rather than industry, emphasize situated and convivial software, and make software explainable using Digital Scientific Notations. It also discusses the need for institutional support, stable infrastructure, and new practices to manage dependencies and environments, highlighting the potential for formalized specifications and narrative-code tools to enhance transparency. The practical significance lies in reducing wasted effort and increasing epistemic diversity by enabling robust, context-aware assessments of software reliability in scientific workflows.

Abstract

Since its beginnings in the 1940s, automated reasoning by computers has become a tool of ever growing importance in scientific research. So far, the rules underlying automated reasoning have mainly been formulated by humans, in the form of program source code. Rules derived from large amounts of data, via machine learning techniques, are a complementary approach currently under intense development. The question of why we should trust these systems, and the results obtained with their help, has been discussed by philosophers of science but has so far received little attention by practitioners. The present work focuses on independent reviewing, an important source of trust in science, and identifies the characteristics of automated reasoning systems that affect their reviewability. It also discusses possible steps towards increasing reviewability and trustworthiness via a combination of technical and social measures.

Establishing trust in automated reasoning

TL;DR

This paper argues that independent reviewing of automated reasoning systems is essential to establish trust in computational science, especially amid the reproducibility crisis. It introduces five dimensions that shape the reviewability of software, analyzes representative case studies (NumPy, GSL, GROMACS) to illustrate strengths and limitations, and advocates a framework of four measures to improve reliability: review the reviewable, align with science rather than industry, emphasize situated and convivial software, and make software explainable using Digital Scientific Notations. It also discusses the need for institutional support, stable infrastructure, and new practices to manage dependencies and environments, highlighting the potential for formalized specifications and narrative-code tools to enhance transparency. The practical significance lies in reducing wasted effort and increasing epistemic diversity by enabling robust, context-aware assessments of software reliability in scientific workflows.

Abstract

Since its beginnings in the 1940s, automated reasoning by computers has become a tool of ever growing importance in scientific research. So far, the rules underlying automated reasoning have mainly been formulated by humans, in the form of program source code. Rules derived from large amounts of data, via machine learning techniques, are a complementary approach currently under intense development. The question of why we should trust these systems, and the results obtained with their help, has been discussed by philosophers of science but has so far received little attention by practitioners. The present work focuses on independent reviewing, an important source of trust in science, and identifies the characteristics of automated reasoning systems that affect their reviewability. It also discusses possible steps towards increasing reviewability and trustworthiness via a combination of technical and social measures.
Paper Structure (20 sections, 3 figures)

This paper contains 20 sections, 3 figures.

Figures (3)

  • Figure 1: The five dimensions of scientific software that influence its reviewability.
  • Figure 2: A typical software stack as used in a research project
  • Figure 3: Four measures that can be taken to make scientific software more trustworthy.