Table of Contents
Fetching ...

Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems

Stefan Kramer, Mattia Cerrato, Jannis Brugger, Sašo Džeroski, Ross King

TL;DR

The paper surveys automated scientific discovery from equation discovery and symbolic regression to autonomous, closed-loop discovery systems and robot scientists. It canvasses historical approaches, modern neural-guided methods, and neural operators for dynamics, while discussing autonomy levels, evaluation benchmarks, and open challenges. Key contributions include framing the integration gap between interpretable scientific knowledge and autonomous experimentation, and outlining a roadmap toward human-competitive AI scientists. The work underscores the practical importance of developing open-ended, communicable, and robust autonomous discovery systems that can operate across domains and eventually address grand challenges like the Nobel Turing Grand Challenge.

Abstract

The paper surveys automated scientific discovery, from equation discovery and symbolic regression to autonomous discovery systems and agents. It discusses the individual approaches from a "big picture" perspective and in context, but also discusses open issues and recent topics like the various roles of deep neural networks in this area, aiding in the discovery of human-interpretable knowledge. Further, we will present closed-loop scientific discovery systems, starting with the pioneering work on the Adam system up to current efforts in fields from material science to astronomy. Finally, we will elaborate on autonomy from a machine learning perspective, but also in analogy to the autonomy levels in autonomous driving. The maximal level, level five, is defined to require no human intervention at all in the production of scientific knowledge. Achieving this is one step towards solving the Nobel Turing Grand Challenge to develop AI Scientists: AI systems capable of making Nobel-quality scientific discoveries highly autonomously at a level comparable, and possibly superior, to the best human scientists by 2050.

Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems

TL;DR

The paper surveys automated scientific discovery from equation discovery and symbolic regression to autonomous, closed-loop discovery systems and robot scientists. It canvasses historical approaches, modern neural-guided methods, and neural operators for dynamics, while discussing autonomy levels, evaluation benchmarks, and open challenges. Key contributions include framing the integration gap between interpretable scientific knowledge and autonomous experimentation, and outlining a roadmap toward human-competitive AI scientists. The work underscores the practical importance of developing open-ended, communicable, and robust autonomous discovery systems that can operate across domains and eventually address grand challenges like the Nobel Turing Grand Challenge.

Abstract

The paper surveys automated scientific discovery, from equation discovery and symbolic regression to autonomous discovery systems and agents. It discusses the individual approaches from a "big picture" perspective and in context, but also discusses open issues and recent topics like the various roles of deep neural networks in this area, aiding in the discovery of human-interpretable knowledge. Further, we will present closed-loop scientific discovery systems, starting with the pioneering work on the Adam system up to current efforts in fields from material science to astronomy. Finally, we will elaborate on autonomy from a machine learning perspective, but also in analogy to the autonomy levels in autonomous driving. The maximal level, level five, is defined to require no human intervention at all in the production of scientific knowledge. Achieving this is one step towards solving the Nobel Turing Grand Challenge to develop AI Scientists: AI systems capable of making Nobel-quality scientific discoveries highly autonomously at a level comparable, and possibly superior, to the best human scientists by 2050.
Paper Structure (15 sections, 6 figures, 3 tables)

This paper contains 15 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of the two realms of automated scientific discovery: (i) the discovery and communication of human-interpretable knowledge in a representation used by scientists in the field, e.g., equations (right-hand side) and (ii) autonomy and automation in science (left-hand slide). Approaches integrating both are currently rare.
  • Figure 2: (a) BACON Langley1977Langley1987 (b) Example of context-free grammar guiding the search for equations in the Lagramge system Todorovski1997 (c) A probabilistic context-free grammar as used in ProGED Brence2021 (d) Symbolic regression Schmidt2009
  • Figure 3: Workflow of Cranmer et al. Cranmer2020: GNNs as an intermediate representation to support or enable the learning process
  • Figure 4: Neural network architecture of model that extracts known and unknown physical parameters from oscillating time series Garcon2022.
  • Figure 5: Six steps of the scientific process.
  • ...and 1 more figures