BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

Yusuf Roohani; Andrew Lee; Qian Huang; Jian Vora; Zachary Steinhart; Kexin Huang; Alexander Marson; Percy Liang; Jure Leskovec

BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

Yusuf Roohani, Andrew Lee, Qian Huang, Jian Vora, Zachary Steinhart, Kexin Huang, Alexander Marson, Percy Liang, Jure Leskovec

TL;DR

BioDiscoveryAgent presents an LLM-powered agent for designing genetic perturbation experiments in a closed-loop framework, leveraging literature retrieval, pathway/gene enrichment, and AI critique to guide round-by-round experiment selection without training a dedicated acquisition function. Across six real-world single-gene datasets and a two-gene perturbation setting, the agent (using Claude 3.5 Sonnet) achieves about 21% more hits on average and up to 46% gains for non-essential genes compared with Bayesian optimization baselines, while also enabling interpretable, literature-backed predictions. Tool augmentation enhances performance for smaller models but yields mixed results for larger models, highlighting a relationship between model scale and external-data reliance. The approach demonstrates a practical, transparent paradigm for AI-assisted design of biological experiments, with potential to accelerate discovery while keeping human-in-the-loop oversight. Limitations include dataset-specific gains and the need for better integration of non-text data and more systematic tool selection.

Abstract

Agents based on large language models have shown great potential in accelerating scientific discovery by leveraging their rich background knowledge and reasoning capabilities. In this paper, we introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions. We demonstrate our agent on the problem of designing genetic perturbation experiments, where the aim is to find a small subset out of many possible genes that, when perturbed, result in a specific phenotype (e.g., cell growth). Utilizing its biological knowledge, BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model or explicitly design an acquisition function as in Bayesian optimization. Moreover, BioDiscoveryAgent, using Claude 3.5 Sonnet, achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets, and a 46% improvement in the harder task of non-essential gene perturbation, compared to existing Bayesian optimization baselines specifically trained for this task. Our evaluation includes one dataset that is unpublished, ensuring it is not part of the language model's training data. Additionally, BioDiscoveryAgent predicts gene combinations to perturb more than twice as accurately as a random baseline, a task so far not explored in the context of closed-loop experiment design. The agent also has access to tools for searching the biomedical literature, executing code to analyze biological datasets, and prompting another agent to critically evaluate its predictions. Overall, BioDiscoveryAgent is interpretable at every stage, representing an accessible new paradigm in the computational design of biological experiments with the potential to augment scientists' efficacy.

BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

TL;DR

Abstract

Paper Structure (17 sections, 9 figures, 16 tables, 1 algorithm)

This paper contains 17 sections, 9 figures, 16 tables, 1 algorithm.

Introduction
Problem formulation
BioDiscoveryAgent
Agent tools
Experiments
Datasets and Baselines
Results
Related work
Discussion
Prompt
Dataset Specific Prompts
Algorithm for BioDiscoveryAgent
Tool Descriptions for BioDiscoveryAgent
Recommendations on tool use for BioDiscoveryAgent
Baselines
...and 2 more sections

Figures (9)

Figure 1: An AI agent for closed-loop experiment design.(a) Conventional Bayesian optimization approach for experiment design involves training a machine learning model in every experimental round, scoring all perturbations and defining an acquisition function for selecting genes to perturb in the next round. (b) Overview of BioDiscoveryAgent. In every round, the agent is given a prompt that describes the research problem, its role and task as well as experimental results from previous rounds. It generates a set of genes to perturb in the next round and provides reasoning for its prediction. The agent primarily makes use of a large language model for its predictions but also has access to additional tools such as the ability to search for relevant scientific papers on the internet.
Figure 2: Per-round performance comparison to machine learning baselines for 1-gene perturbation experiments: Each line averages the hit ratio from 10 runs, with error bars indicating 1 standard deviation. 128 genes were predicted per round over 5 rounds. For scharenberg2023spns1, a batch size of 32 was used due to its smaller size of 1061 perturbations.
Figure 3: Agent accounts for both prior knowledge and observations in decision-making Three scenarios are considered: BioDiscoveryAgent (Claude 3.5 Sonnet) has access to task description and experimental observations (Prompt + Observation); only has access to experimental observations (Observation Only); only has access to the task description (Prompt Only). (a) Hit ratio at each experimental round across 30 rounds with 32 genes predicted in each. Each line corresponds to the average over 8 runs with error bars representing 1SD. (b) Jaccard similarity index between all predicted genes at 30 rounds of experimentation. Each cell corresponds to a different model run.
Figure 4: Gene selection strategy: (a) The space of genes that can be tested in a given experiment is constrained by expeirmental limitations. BioDiscoveryAgent can take a few tries to select genes within this limited space. (b) A common error is repeating previously tested genes. (c) Often this will result in the agent getting stalled and unable to make successful selections, especially in the case of large batch sizes. (d) After several failed attempts, we summarize the space of genes that can be tested while prioritizing gene diversity.
Figure 5: BioDiscoveryAgent workflow with all tools over a single experimental round. Prompts and agent responses have been summarized. See Appendix \ref{['sec:trace']} for full trace. (a) The input to the agent is the description of the problem. (b) In case of the literature search tool, the LLM first determines appropriate search terms for finding relevant scientific papers. Top ranked retrieved papers are summarized by the LLM (along with line numbers that reference directly to text in the paper) and provided as additional context to the agent when predicting the set of genes to perturb.(c) The agent predicts the set of genes to perturb in the next experiment. Response is structured containing Reflection, Research Plan and Solution. (d) In case of using the gene search tool, the LLM identifies a query gene with which to search for other genes (e) The LLM can also be prompted differently to function as a critic and analyze the predictions so far from a different perspective (f) Final output gene list after using different tools. We used Claude v1 for this analysis since it showed high absolute score as well as high performance gain through tool use for this dataset schmidt2022crispr.
...and 4 more figures

BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

TL;DR

Abstract

BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

Authors

TL;DR

Abstract

Table of Contents

Figures (9)