Table of Contents
Fetching ...

AIGS: Generating Science from AI-Powered Automated Falsification

Zijun Liu, Kaiming Liu, Yiqi Zhu, Xuanyu Lei, Zonghan Yang, Zhenhe Zhang, Peng Li, Yang Liu

TL;DR

This work defines AI-Generated Science (AIGS) and centers falsification as the core mechanism for scientific discovery, proposing Baby-AIGS as a practical, baby-step toward fully autonomous end-to-end AIGS. It introduces a three-agent stack (ProposalAgent, ReviewAgent, FalsificationAgent) and a Domain-Specific Language (DSL) to translate ideas into executable experiments, paired with a multi-sampling strategy to enhance exploration. Through three ML-focused experiments (data engineering, self-instruct alignment, language modeling), Baby-AIGS demonstrates meaningfully autonomous discovery and iterative creativity, while recognizing that current performance lags behind expert researchers and that robust falsification remains challenging. The paper also discusses actionable limitations, ethical considerations, and a roadmap for expanding AIGS toward broader scientific domains and responsible deployment.

Abstract

Rapid development of artificial intelligence has drastically accelerated the development of scientific discovery. Trained with large-scale observation data, deep neural networks extract the underlying patterns in an end-to-end manner and assist human researchers with highly-precised predictions in unseen scenarios. The recent rise of Large Language Models (LLMs) and the empowered autonomous agents enable scientists to gain help through interaction in different stages of their research, including but not limited to literature review, research ideation, idea implementation, and academic writing. However, AI researchers instantiated by foundation model empowered agents with full-process autonomy are still in their infancy. In this paper, we study $\textbf{AI-Generated Science}$ (AIGS), where agents independently and autonomously complete the entire research process and discover scientific laws. By revisiting the definition of scientific research, we argue that $\textit{falsification}$ is the essence of both human research process and the design of an AIGS system. Through the lens of falsification, prior systems attempting towards AI-Generated Science either lack the part in their design, or rely heavily on existing verification engines that narrow the use in specialized domains. In this work, we propose Baby-AIGS as a baby-step demonstration of a full-process AIGS system, which is a multi-agent system with agents in roles representing key research process. By introducing FalsificationAgent, which identify and then verify possible scientific discoveries, we empower the system with explicit falsification. Experiments on three tasks preliminarily show that Baby-AIGS could produce meaningful scientific discoveries, though not on par with experienced human researchers. Finally, we discuss on the limitations of current Baby-AIGS, actionable insights, and related ethical issues in detail.

AIGS: Generating Science from AI-Powered Automated Falsification

TL;DR

This work defines AI-Generated Science (AIGS) and centers falsification as the core mechanism for scientific discovery, proposing Baby-AIGS as a practical, baby-step toward fully autonomous end-to-end AIGS. It introduces a three-agent stack (ProposalAgent, ReviewAgent, FalsificationAgent) and a Domain-Specific Language (DSL) to translate ideas into executable experiments, paired with a multi-sampling strategy to enhance exploration. Through three ML-focused experiments (data engineering, self-instruct alignment, language modeling), Baby-AIGS demonstrates meaningfully autonomous discovery and iterative creativity, while recognizing that current performance lags behind expert researchers and that robust falsification remains challenging. The paper also discusses actionable limitations, ethical considerations, and a roadmap for expanding AIGS toward broader scientific domains and responsible deployment.

Abstract

Rapid development of artificial intelligence has drastically accelerated the development of scientific discovery. Trained with large-scale observation data, deep neural networks extract the underlying patterns in an end-to-end manner and assist human researchers with highly-precised predictions in unseen scenarios. The recent rise of Large Language Models (LLMs) and the empowered autonomous agents enable scientists to gain help through interaction in different stages of their research, including but not limited to literature review, research ideation, idea implementation, and academic writing. However, AI researchers instantiated by foundation model empowered agents with full-process autonomy are still in their infancy. In this paper, we study (AIGS), where agents independently and autonomously complete the entire research process and discover scientific laws. By revisiting the definition of scientific research, we argue that is the essence of both human research process and the design of an AIGS system. Through the lens of falsification, prior systems attempting towards AI-Generated Science either lack the part in their design, or rely heavily on existing verification engines that narrow the use in specialized domains. In this work, we propose Baby-AIGS as a baby-step demonstration of a full-process AIGS system, which is a multi-agent system with agents in roles representing key research process. By introducing FalsificationAgent, which identify and then verify possible scientific discoveries, we empower the system with explicit falsification. Experiments on three tasks preliminarily show that Baby-AIGS could produce meaningful scientific discoveries, though not on par with experienced human researchers. Finally, we discuss on the limitations of current Baby-AIGS, actionable insights, and related ethical issues in detail.

Paper Structure

This paper contains 66 sections, 5 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Examples of scientific research processes conducted by human researchers. Explicit falsification serves as a vital stage to falsify or verify the proposed hypotheses from either empirical or theoretical experiments, leading to the ultimate scientific discovery.
  • Figure 2: Overview of the four paradigms of AI-accelerate scientific discovery systems.
  • Figure 3: Overview of our Baby-AIGS system design. The left part denotes Pre-Falsification phase, where ProposalAgent iteratively refine the proposed idea and methodology based on empirical and verbose feedback from ExpAgent, ReviewAgent, etc. The iterative process summons multi-turn logs as the history context, based on which FalsificationAgent could produce scientific discovery in the Falsification phase, as shown in the right part. Other modules are optional for the automated full-process research.
  • Figure 4: The relationship between formalization degree and system executability when expressing ideas through Natural Language (NL), Coding Language (CL), and Domain-Specific Language (DSL), illustrated with examples. NL expresses ideas in the simplest and most flexible form but is non-executable; CL offers greater precision but is challenging to achieve error-free implementation; DSL achieves a better tradeoff between flexibility and executability.
  • Figure 5: The DSL design in Baby-AIGS for experimented research topics in Section \ref{['sec:exp-setup']}. The full demonstration is in Appendix \ref{['app:exp-detail']}.
  • ...and 7 more figures