BioDisco: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation

Yujing Ke; Kevin George; Kathan Pandya; David Blumenthal; Maximilian Sprang; Gerrit Großmann; Sebastian Vollmer; David Antony Selby

BioDisco: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation

Yujing Ke, Kevin George, Kathan Pandya, David Blumenthal, Maximilian Sprang, Gerrit Großmann, Sebastian Vollmer, David Antony Selby

TL;DR

BioDisco introduces a modular, multi‑agent framework for grounded biomedical hypothesis generation that jointly leverages biomedical knowledge graphs and live literature. The system uses specialized agents (Background, Explorer, Scientist, Critic, Reviewer, Refiner, Planner) within an iterative feedback loop and validates hypotheses with temporal held‑out evaluation, Bradley‑Terry paired comparisons, and Bayesian Rasch human analysis. Temporal predictions on unseen datasets and ablation studies show that dual‑mode grounding and iterative refinement improve novelty and significance beyond generalist biomedical agents. An open‑source Python package enables researchers to deploy BioDisco with customizable LLMs and knowledge graphs, advancing scalable, evidence-grounded discovery while acknowledging limitations in verifiability and real‑world validation.

Abstract

Identifying novel hypotheses is essential to scientific research, yet this process risks being overwhelmed by the sheer volume and complexity of available information. Existing automated methods often struggle to generate novel and evidence-grounded hypotheses, lack robust iterative refinement and rarely undergo rigorous temporal evaluation for future discovery potential. To address this, we propose BioDisco, a multi-agent framework that draws upon language model-based reasoning and a dual-mode evidence system (biomedical knowledge graphs and automated literature retrieval) for grounded novelty, integrates an internal scoring and feedback loop for iterative refinement, and validates performance through pioneering temporal and human evaluations and a Bradley-Terry paired comparison model to provide statistically-grounded assessment. Our evaluations demonstrate superior novelty and significance over ablated configurations and generalist biomedical agents. Designed for flexibility and modularity, BioDisco allows seamless integration of custom language models or knowledge graphs, and can be run with just a few lines of code.

BioDisco: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation

TL;DR

Abstract

BioDisco: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)