Table of Contents
Fetching ...

Patience is all you need! An agentic system for performing scientific literature review

David Brett, Anniek Myatt

TL;DR

The paper presents an agentic, LLM-enabled system for performing scientific literature reviews by combining sparse retrieval with LLM-based query expansion, long-context processing, BM25L re-ranking, and a multi-stage information extraction pipeline. It adds a verification and diversification layer (CoVe) to plan and check key statements, and introduces literature expansion to broaden coverage beyond initial results. Evaluations on LitQA2 and PubMedQA show competitive retrieval performance and credible attribution, with the full workflow achieving substantial precision and reasonable recall, while end-to-end review generation can be completed in roughly 10–30 minutes. The work demonstrates that sparse retrieval, augmented by targeted expansion and verification, can yield well-grounded reviews without dense retrieval infrastructure, offering practical impact for faster, sourced literature synthesis while highlighting areas for benchmark and scalability improvements.

Abstract

Large language models (LLMs) have grown in their usage to provide support for question answering across numerous disciplines. The models on their own have already shown promise for answering basic questions, however fail quickly where expert domain knowledge is required or the question is nuanced. Scientific research often involves searching for relevant literature, distilling pertinent information from that literature and analysing how the findings support or contradict one another. The information is often encapsulated in the full text body of research articles, rather than just in the abstracts. Statements within these articles frequently require the wider article context to be fully understood. We have built an LLM-based system that performs such search and distillation of information encapsulated in scientific literature, and we evaluate our keyword based search and information distillation system against a set of biology related questions from previously released literature benchmarks. We demonstrate sparse retrieval methods exhibit results close to state of the art without the need for dense retrieval, with its associated infrastructure and complexity overhead. We also show how to increase the coverage of relevant documents for literature review generation.

Patience is all you need! An agentic system for performing scientific literature review

TL;DR

The paper presents an agentic, LLM-enabled system for performing scientific literature reviews by combining sparse retrieval with LLM-based query expansion, long-context processing, BM25L re-ranking, and a multi-stage information extraction pipeline. It adds a verification and diversification layer (CoVe) to plan and check key statements, and introduces literature expansion to broaden coverage beyond initial results. Evaluations on LitQA2 and PubMedQA show competitive retrieval performance and credible attribution, with the full workflow achieving substantial precision and reasonable recall, while end-to-end review generation can be completed in roughly 10–30 minutes. The work demonstrates that sparse retrieval, augmented by targeted expansion and verification, can yield well-grounded reviews without dense retrieval infrastructure, offering practical impact for faster, sourced literature synthesis while highlighting areas for benchmark and scalability improvements.

Abstract

Large language models (LLMs) have grown in their usage to provide support for question answering across numerous disciplines. The models on their own have already shown promise for answering basic questions, however fail quickly where expert domain knowledge is required or the question is nuanced. Scientific research often involves searching for relevant literature, distilling pertinent information from that literature and analysing how the findings support or contradict one another. The information is often encapsulated in the full text body of research articles, rather than just in the abstracts. Statements within these articles frequently require the wider article context to be fully understood. We have built an LLM-based system that performs such search and distillation of information encapsulated in scientific literature, and we evaluate our keyword based search and information distillation system against a set of biology related questions from previously released literature benchmarks. We demonstrate sparse retrieval methods exhibit results close to state of the art without the need for dense retrieval, with its associated infrastructure and complexity overhead. We also show how to increase the coverage of relevant documents for literature review generation.

Paper Structure

This paper contains 19 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Data flow to support Q&A
  • Figure 2: Search rank with search term agent for benchmark sources in PubMed and PubMed Central (* with 95% CI)
  • Figure 3: LitQA2 key passage location with re-ranking (with 95% CI)
  • Figure 4: Search rank with search term agent for benchmark sources in PubMed Central (* with 95% CI)
  • Figure 5: Total potential search hits across questions in LitQA2 with search term agent