Table of Contents
Fetching ...

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, David Ha

TL;DR

The AI Scientist-v2 demonstrates autonomous end-to-end scientific discovery through agentic tree search, template-free experimentation, and Vision-Language Model–assisted evaluation. It advances prior work by enabling domain-general idea generation and parallel exploration, culminating in the first AI-generated manuscript accepted at a peer-reviewed workshop. The study discusses both successes and limitations, addresses ethical considerations, and open-sources the project to foster broader research. Collectively, it illustrates a tangible step toward scalable AI-driven science while underscoring the need for rigorous validation and responsible oversight for real-world deployment.

Abstract

AI is increasingly playing a pivotal role in transforming how scientific discoveries are made. We introduce The AI Scientist-v2, an end-to-end agentic system capable of producing the first entirely AI generated peer-review-accepted workshop paper. This system iteratively formulates scientific hypotheses, designs and executes experiments, analyzes and visualizes data, and autonomously authors scientific manuscripts. Compared to its predecessor (v1, Lu et al., 2024 arXiv:2408.06292), The AI Scientist-v2 eliminates the reliance on human-authored code templates, generalizes effectively across diverse machine learning domains, and leverages a novel progressive agentic tree-search methodology managed by a dedicated experiment manager agent. Additionally, we enhance the AI reviewer component by integrating a Vision-Language Model (VLM) feedback loop for iterative refinement of content and aesthetics of the figures. We evaluated The AI Scientist-v2 by submitting three fully autonomous manuscripts to a peer-reviewed ICLR workshop. Notably, one manuscript achieved high enough scores to exceed the average human acceptance threshold, marking the first instance of a fully AI-generated paper successfully navigating a peer review. This accomplishment highlights the growing capability of AI in conducting all aspects of scientific research. We anticipate that further advancements in autonomous scientific discovery technologies will profoundly impact human knowledge generation, enabling unprecedented scalability in research productivity and significantly accelerating scientific breakthroughs, greatly benefiting society at large. We have open-sourced the code at https://github.com/SakanaAI/AI-Scientist-v2 to foster the future development of this transformative technology. We also discuss the role of AI in science, including AI safety.

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

TL;DR

The AI Scientist-v2 demonstrates autonomous end-to-end scientific discovery through agentic tree search, template-free experimentation, and Vision-Language Model–assisted evaluation. It advances prior work by enabling domain-general idea generation and parallel exploration, culminating in the first AI-generated manuscript accepted at a peer-reviewed workshop. The study discusses both successes and limitations, addresses ethical considerations, and open-sources the project to foster broader research. Collectively, it illustrates a tangible step toward scalable AI-driven science while underscoring the need for rigorous validation and responsible oversight for real-world deployment.

Abstract

AI is increasingly playing a pivotal role in transforming how scientific discoveries are made. We introduce The AI Scientist-v2, an end-to-end agentic system capable of producing the first entirely AI generated peer-review-accepted workshop paper. This system iteratively formulates scientific hypotheses, designs and executes experiments, analyzes and visualizes data, and autonomously authors scientific manuscripts. Compared to its predecessor (v1, Lu et al., 2024 arXiv:2408.06292), The AI Scientist-v2 eliminates the reliance on human-authored code templates, generalizes effectively across diverse machine learning domains, and leverages a novel progressive agentic tree-search methodology managed by a dedicated experiment manager agent. Additionally, we enhance the AI reviewer component by integrating a Vision-Language Model (VLM) feedback loop for iterative refinement of content and aesthetics of the figures. We evaluated The AI Scientist-v2 by submitting three fully autonomous manuscripts to a peer-reviewed ICLR workshop. Notably, one manuscript achieved high enough scores to exceed the average human acceptance threshold, marking the first instance of a fully AI-generated paper successfully navigating a peer review. This accomplishment highlights the growing capability of AI in conducting all aspects of scientific research. We anticipate that further advancements in autonomous scientific discovery technologies will profoundly impact human knowledge generation, enabling unprecedented scalability in research productivity and significantly accelerating scientific breakthroughs, greatly benefiting society at large. We have open-sourced the code at https://github.com/SakanaAI/AI-Scientist-v2 to foster the future development of this transformative technology. We also discuss the role of AI in science, including AI safety.

Paper Structure

This paper contains 49 sections, 13 figures, 4 tables.

Figures (13)

  • Figure 1: The AI Scientist-v2 Workflow. The workflow consists of several phases covering automated idea generation, experiment execution, figure visualization, manuscript writing, and reviewing. Unlike the initial version, The AI Scientist-v2 removes the dependency on human-coded templates. Instead, it employs agentic tree search (managed by an Experiment Progress Manager across several stages, orange) to generate and refine code implementations. Subsequent experimentation leverages the best-performing code checkpoints (nodes) from the tree search to iteratively test various research hypotheses.
  • Figure 2: The AI Scientist-v2 workflow showing different stages of tree-based experimentation. Stage 1 begins at the root node, where initial experiment code is generated in parallel. After running the experiment code and visualization scripts, each node is classified based on the outcome: if an error occurs, it is marked as a buggy node; otherwise, it is labeled as a non-buggy node. New child nodes are created differently depending on their parent node's status: For non-buggy nodes, refinement is applied to improve the experiment code for better performance. For buggy nodes, the system attempts to debug them using stored error information. A best-performing node, selected by LLM-based evaluation, is passed down as the root node of Stage 2. From this root node, child nodes are created for hyperparameter tuning. The top-performing node from Stage 2 is then used to initialize Stage 3, where the system executes the research agenda, applies refinements, and performs debugging as needed. In Stage 4, similar to Stage 2, the root node generates ablation nodes. Additionally, replication nodes repeat the same experiment as their parent node, while aggregation nodes collect results from replication nodes to generate combined visualizations and summaries.
  • Figure 3: Peer-reviewed ICBINB workshop paper generated by The AI Scientist-v2. The paper investigates the usage of a temporal consistency regularizer on the embeddings of an LSTM-based sequence model. The results discuss the effect of the regularizer on compositional regularization and highlight the difficulty of training models capable of improved generalization. It received peer-review scores of 6 (weak accept), 7 (accept), and 6 (weak accept) before meta-review and ranked among the top 45% submitted workshop papers.
  • Figure 4: Example of the data generating function used in the experiments.
  • Figure 5: The generated model class shows an embedding layer, a single LSTM layer, and a linear layer head.
  • ...and 8 more figures