Table of Contents
Fetching ...

SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning

Alireza Ghafarollahi, Markus J. Buehler

TL;DR

SciAgents presents a modular, multi-agent framework that uses a large ontological knowledge graph, LLMs, and in-situ learning to automate scientific hypothesis generation. By sampling diverse subgraphs and distributing reasoning across specialized agents (Ontologist, Scientist, Critic, Planner, Assistant), the system autonomously formulates, expands, and critiques hypotheses, with novelty checks against current literature. The approach is demonstrated in bio-inspired materials contexts, illustrating scalable, high-novelty ideation and the potential to accelerate materials discovery. The results suggest that a swarm-like AI can outperform traditional, human-driven processes by systematically exploring interdisciplinary relationships and refining ideas through adversarial prompting and literature-aware scoring.

Abstract

A key challenge in artificial intelligence is the creation of systems capable of autonomously advancing scientific understanding by exploring novel domains, identifying complex patterns, and uncovering previously unseen connections in vast scientific data. In this work, we present SciAgents, an approach that leverages three core concepts: (1) the use of large-scale ontological knowledge graphs to organize and interconnect diverse scientific concepts, (2) a suite of large language models (LLMs) and data retrieval tools, and (3) multi-agent systems with in-situ learning capabilities. Applied to biologically inspired materials, SciAgents reveals hidden interdisciplinary relationships that were previously considered unrelated, achieving a scale, precision, and exploratory power that surpasses traditional human-driven research methods. The framework autonomously generates and refines research hypotheses, elucidating underlying mechanisms, design principles, and unexpected material properties. By integrating these capabilities in a modular fashion, the intelligent system yields material discoveries, critique and improve existing hypotheses, retrieve up-to-date data about existing research, and highlights their strengths and limitations. Our case studies demonstrate scalable capabilities to combine generative AI, ontological representations, and multi-agent modeling, harnessing a `swarm of intelligence' similar to biological systems. This provides new avenues for materials discovery and accelerates the development of advanced materials by unlocking Nature's design principles.

SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning

TL;DR

SciAgents presents a modular, multi-agent framework that uses a large ontological knowledge graph, LLMs, and in-situ learning to automate scientific hypothesis generation. By sampling diverse subgraphs and distributing reasoning across specialized agents (Ontologist, Scientist, Critic, Planner, Assistant), the system autonomously formulates, expands, and critiques hypotheses, with novelty checks against current literature. The approach is demonstrated in bio-inspired materials contexts, illustrating scalable, high-novelty ideation and the potential to accelerate materials discovery. The results suggest that a swarm-like AI can outperform traditional, human-driven processes by systematically exploring interdisciplinary relationships and refining ideas through adversarial prompting and literature-aware scoring.

Abstract

A key challenge in artificial intelligence is the creation of systems capable of autonomously advancing scientific understanding by exploring novel domains, identifying complex patterns, and uncovering previously unseen connections in vast scientific data. In this work, we present SciAgents, an approach that leverages three core concepts: (1) the use of large-scale ontological knowledge graphs to organize and interconnect diverse scientific concepts, (2) a suite of large language models (LLMs) and data retrieval tools, and (3) multi-agent systems with in-situ learning capabilities. Applied to biologically inspired materials, SciAgents reveals hidden interdisciplinary relationships that were previously considered unrelated, achieving a scale, precision, and exploratory power that surpasses traditional human-driven research methods. The framework autonomously generates and refines research hypotheses, elucidating underlying mechanisms, design principles, and unexpected material properties. By integrating these capabilities in a modular fashion, the intelligent system yields material discoveries, critique and improve existing hypotheses, retrieve up-to-date data about existing research, and highlights their strengths and limitations. Our case studies demonstrate scalable capabilities to combine generative AI, ontological representations, and multi-agent modeling, harnessing a `swarm of intelligence' similar to biological systems. This provides new avenues for materials discovery and accelerates the development of advanced materials by unlocking Nature's design principles.
Paper Structure (24 sections, 19 figures, 5 tables)

This paper contains 24 sections, 19 figures, 5 tables.

Figures (19)

  • Figure 1: Overview of the multi-agent graph-reasoning system developed here. Panel a, overview of graph construction, as reported in buehler2024accelerating. The visual shows the progression from scientific papers as data source to graph construction, with the image on the right showing a zoomed-in view of the graph. Panels b and c: Two distinct approaches are presented: In b, A multi-agent system based on pre-programmed sequence of interactions between agents, ensuring consistency and reliability, and in c, a fully automated, flexible multi-agent framework that adapts dynamically to the evolving research context. Both systems leverage a sampled path within a global knowledge graph as context to guide the research idea generation process. Each agent plays a specialized role: The Ontologist defines key concepts and relationships, Scientist 1 crafts a detailed research proposal, Scientist 2 expands and refines the proposal, and the Critic agent conducts a thorough review and suggests improvements. The Planner in the second approach develops a detailed plan and the assistant is instructed to check the novelty of the generated research hypotheses. This collaborative framework enables the generation of innovative and well-rounded scientific hypotheses that extend beyond conventional human-driven methods.
  • Figure 2: Overview of the entire process from initial keyword selection to the final document, following a hierarchical expansion strategy where answers are successively refined and improved, enriched with retrieved data, critiqued and amended by identification or critical modeling, simulation and experimental tasks. The process begins with initial keyword identification or random exploration within a graph, followed by path sampling to create a subgraph of relevant concepts and relationships (see, Figure \ref{['fig_11:path']}, for an illustration of how the path can be sampled). This subgraph forms the basis for generating structured output in JSON, including the hypothesis, outcome, mechanisms, design principles, unexpected properties, comparison, and novelty. Each component is subsequently expanded on with individual prompting, to yield significant amount of additional detail, forming a comprehensive draft. This draft then undergoes a critical review process, including amendments for modeling and simulation priorities (e.g., molecular dynamics) and experimental priorities (e.g., synthetic biology). The final integrated draft, along with critical analyses, results in a document that guides further scientific inquiry.
  • Figure 3: Results from our multi-agent model, illustrating a novel research hypothesis based on a knowledge graph connecting the keywords "silk" and "energy-intensive", as an example. This visual overview shows that the system produces detailed, well-organized documentation of research development with multiple pages and detailed text (the example shown here includes 8,100 words). Details of the results are presented in the main text and other figures, and full conversations generated by the SciAgents model are included as Supplementary Information.
  • Figure 4: The knowledge graphs connecting the keywords "silk" and "energy-intensive" extracted from the global graph using (a) random path and (b) the shortest path between the concepts. The difference between nodes and edges sampled in the two approaches is apparent, where enhanced sampling invokes a host of additional concepts that will be incorporated into research development. The richer substrate that forms the basis for agentic reasoning yields more sophisticated research concepts. Agentic reasoning carefully assesses the ideas and negotiate, via adverserial interactions between the agents, a sound prediction and carefully delineated research ideas wu2023autogenNi2023Agentghafarollahi2024protagentsghafarollahi2024atomagentswu2024stateflowwu2023empirical.
  • Figure 5: The profile of the Scientist_1 LLM agent implemented in the first proposed multi-agent approach for automated scientific discovery. The AI agent utilizes the definitions of concepts and relationships between them in the knowledge graph provided by the Ontologist to generate a novel research hypothesis.
  • ...and 14 more figures