Table of Contents
Fetching ...

Cross-Disciplinary Knowledge Retrieval and Synthesis: A Compound AI Architecture for Scientific Discovery

Svitlana Volkova, Peter Bautista, Avinash Hiriyanna, Gabriel Ganberg, Isabel Erickson, Zachary Klinefelter, Nick Abele, Hsien-Te Kao, Grant Engberson

TL;DR

BioSage introduces a compound AI architecture that unifies frontier models, retrieval-augmented generation, and specialized agents to bridge cross-disciplinary knowledge gaps across AI, data science, biomedical, and biosecurity domains. The system emphasizes user-centric, transparent human–AI collaboration with dedicated retrieval, translation, and reasoning agents, implemented via LlamaIndex and OpenSearch. Extensive evaluations across LitQA2, GPQA, WMDP, HLE-Bio, and a new cross-disciplinary benchmark show 13–21% improvements over vanilla RAG and base models, alongside causal analyses revealing how different components affect performance. The work highlights strong potential for accelerating scientific discovery by enabling cross-domain synthesis, while outlining future multimodal retrieval and benchmark development to further advance cross-disciplinary research workflows.

Abstract

The exponential growth of scientific knowledge has created significant barriers to cross-disciplinary knowledge discovery, synthesis and research collaboration. In response to this challenge, we present BioSage, a novel compound AI architecture that integrates LLMs with RAG, orchestrated specialized agents and tools to enable discoveries across AI, data science, biomedical, and biosecurity domains. Our system features several specialized agents including the retrieval agent with query planning and response synthesis that enable knowledge retrieval across domains with citation-backed responses, cross-disciplinary translation agents that align specialized terminology and methodologies, and reasoning agents that synthesize domain-specific insights with transparency, traceability and usability. We demonstrate the effectiveness of our BioSage system through a rigorous evaluation on scientific benchmarks (LitQA2, GPQA, WMDP, HLE-Bio) and introduce a new cross-modal benchmark for biology and AI, showing that our BioSage agents outperform vanilla and RAG approaches by 13\%-21\% powered by Llama 3.1. 70B and GPT-4o models. We perform causal investigations into compound AI system behavior and report significant performance improvements by adding RAG and agents over the vanilla models. Unlike other systems, our solution is driven by user-centric design principles and orchestrates specialized user-agent interaction workflows supporting scientific activities including but not limited to summarization, research debate and brainstorming. Our ongoing work focuses on multimodal retrieval and reasoning over charts, tables, and structured scientific data, along with developing comprehensive multimodal benchmarks for cross-disciplinary discovery. Our compound AI solution demonstrates significant potential for accelerating scientific advancement by reducing barriers between traditionally siloed domains.

Cross-Disciplinary Knowledge Retrieval and Synthesis: A Compound AI Architecture for Scientific Discovery

TL;DR

BioSage introduces a compound AI architecture that unifies frontier models, retrieval-augmented generation, and specialized agents to bridge cross-disciplinary knowledge gaps across AI, data science, biomedical, and biosecurity domains. The system emphasizes user-centric, transparent human–AI collaboration with dedicated retrieval, translation, and reasoning agents, implemented via LlamaIndex and OpenSearch. Extensive evaluations across LitQA2, GPQA, WMDP, HLE-Bio, and a new cross-disciplinary benchmark show 13–21% improvements over vanilla RAG and base models, alongside causal analyses revealing how different components affect performance. The work highlights strong potential for accelerating scientific discovery by enabling cross-domain synthesis, while outlining future multimodal retrieval and benchmark development to further advance cross-disciplinary research workflows.

Abstract

The exponential growth of scientific knowledge has created significant barriers to cross-disciplinary knowledge discovery, synthesis and research collaboration. In response to this challenge, we present BioSage, a novel compound AI architecture that integrates LLMs with RAG, orchestrated specialized agents and tools to enable discoveries across AI, data science, biomedical, and biosecurity domains. Our system features several specialized agents including the retrieval agent with query planning and response synthesis that enable knowledge retrieval across domains with citation-backed responses, cross-disciplinary translation agents that align specialized terminology and methodologies, and reasoning agents that synthesize domain-specific insights with transparency, traceability and usability. We demonstrate the effectiveness of our BioSage system through a rigorous evaluation on scientific benchmarks (LitQA2, GPQA, WMDP, HLE-Bio) and introduce a new cross-modal benchmark for biology and AI, showing that our BioSage agents outperform vanilla and RAG approaches by 13\%-21\% powered by Llama 3.1. 70B and GPT-4o models. We perform causal investigations into compound AI system behavior and report significant performance improvements by adding RAG and agents over the vanilla models. Unlike other systems, our solution is driven by user-centric design principles and orchestrates specialized user-agent interaction workflows supporting scientific activities including but not limited to summarization, research debate and brainstorming. Our ongoing work focuses on multimodal retrieval and reasoning over charts, tables, and structured scientific data, along with developing comprehensive multimodal benchmarks for cross-disciplinary discovery. Our compound AI solution demonstrates significant potential for accelerating scientific advancement by reducing barriers between traditionally siloed domains.

Paper Structure

This paper contains 27 sections, 10 figures.

Figures (10)

  • Figure 1: User-centric BioSage design demonstrating cross-disciplinary knowledge retrieval and synthesis and contextual conversation flow. The system processes a query, provides transparent explanations of agent reasoning processes while synthesizing knowledge across domains, and maintains conversational context to handle follow-up questions with structured insight presentation to the user.
  • Figure 2: BioSage compound AI architecture that integrates specialized agents (Query Planning, Response Synthesis, Reasoning, and Translation) with vectorized domain-specific knowledge bases through LlamaIndex. User queries flow through the UI to the Query Planning agent, which orchestrates workflows across other agents and RAG to access specialized knowledge repositories across domains.
  • Figure 3: Accuracy (%) results for different BioSage compound AI configurations on four scientific benchmarks: (a) LitQA2, (b) WMDP with 200 question subset, (c) HLE-Bio with 303 questions, and (d) GPQA Diamond; we compare three configurations: Vanilla LLM (base model), Vanilla RAG, and Retrieval Agent. Blue bars represent GPT-4o model performance, while dark blue bars show Llama-3 model results. The data demonstrates significant performance improvements with BioSage's retrieval agents compared to vanilla configurations, with particularly notable gains on the LitQA2 (29.6% vs. 20.2% for GPT-4o) and WMDP (92.5% vs. 69.0% for GPT-4o) benchmarks.
  • Figure 4: Performance gains (accuracy in percentage points) over vanilla LLMs (GPT4o and Phi4) on BioSage cross-domain benchmark.
  • Figure 5: Intervention effects heatmap visualizing the causal effect of each intervention on various outcome metrics across all tested configurations. Colors represent positive effects (blue) and negative effects (red) with intensity indicating effect magnitude. The heatmap reveals distinct causal patterns between model architectures, compound AI components, and performance outcomes across metrics.
  • ...and 5 more figures