Table of Contents
Fetching ...

SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature

Hang Ding, Yilun Zhao, Tiansheng Hu, Manasi Patwardhan, Arman Cohan

TL;DR

SciRAG tackles the need for trustworthy, scalable synthesis across the rapidly growing scientific literature. It introduces an open-source framework that fuses adaptive retrieval, citation-aware symbolic reasoning, and outline-guided synthesis to produce coherent, well-supported answers with transparent provenance. The approach relies on a plan–critic–solve outline, a two-stage citation-graph expansion with role-based symbolic reasoning, and an answer–critique–retrieval loop to balance depth and breadth. Extensive open-retrieval experiments across SciFact, PubMedQA, QASA, and ScholarQA show state-of-the-art performance in factual accuracy and synthesis quality, complemented by ablation and human-evaluation analyses. Limitations include reliance on general-purpose LLMs without domain-specific citation fine-tuning and non-trivial computational overhead, pointing to future work on lightweight, domain-tuned components and efficiency optimizations.

Abstract

The accelerating growth of scientific publications has intensified the need for scalable, trustworthy systems to synthesize knowledge across diverse literature. While recent retrieval-augmented generation (RAG) methods have improved access to scientific information, they often overlook citation graph structure, adapt poorly to complex queries, and yield fragmented, hard-to-verify syntheses. We introduce SciRAG, an open-source framework for scientific literature exploration that addresses these gaps through three key innovations: (1) adaptive retrieval that flexibly alternates between sequential and parallel evidence gathering; (2) citation-aware symbolic reasoning that leverages citation graphs to organize and filter supporting documents; and (3) outline-guided synthesis that plans, critiques, and refines answers to ensure coherence and transparent attribution. Extensive experiments across multiple benchmarks such as QASA and ScholarQA demonstrate that SciRAG outperforms prior systems in factual accuracy and synthesis quality, establishing a new foundation for reliable, large-scale scientific knowledge aggregation.

SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature

TL;DR

SciRAG tackles the need for trustworthy, scalable synthesis across the rapidly growing scientific literature. It introduces an open-source framework that fuses adaptive retrieval, citation-aware symbolic reasoning, and outline-guided synthesis to produce coherent, well-supported answers with transparent provenance. The approach relies on a plan–critic–solve outline, a two-stage citation-graph expansion with role-based symbolic reasoning, and an answer–critique–retrieval loop to balance depth and breadth. Extensive open-retrieval experiments across SciFact, PubMedQA, QASA, and ScholarQA show state-of-the-art performance in factual accuracy and synthesis quality, complemented by ablation and human-evaluation analyses. Limitations include reliance on general-purpose LLMs without domain-specific citation fine-tuning and non-trivial computational overhead, pointing to future work on lightweight, domain-tuned components and efficiency optimizations.

Abstract

The accelerating growth of scientific publications has intensified the need for scalable, trustworthy systems to synthesize knowledge across diverse literature. While recent retrieval-augmented generation (RAG) methods have improved access to scientific information, they often overlook citation graph structure, adapt poorly to complex queries, and yield fragmented, hard-to-verify syntheses. We introduce SciRAG, an open-source framework for scientific literature exploration that addresses these gaps through three key innovations: (1) adaptive retrieval that flexibly alternates between sequential and parallel evidence gathering; (2) citation-aware symbolic reasoning that leverages citation graphs to organize and filter supporting documents; and (3) outline-guided synthesis that plans, critiques, and refines answers to ensure coherence and transparent attribution. Extensive experiments across multiple benchmarks such as QASA and ScholarQA demonstrate that SciRAG outperforms prior systems in factual accuracy and synthesis quality, establishing a new foundation for reliable, large-scale scientific knowledge aggregation.

Paper Structure

This paper contains 31 sections, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: An overview of SciRAG framework.
  • Figure 2: An illustration of the SciRAG pipeline. The process begins with guideline drafting and initial answer generation. Each retrieval node searches documents, decides whether to expand along the citation graph, builds contribution chains, and applies reasoning-based reranking to judge from current information whether to continue or stop. Adaptive retrieval integrates multiple nodes to balance sequential exploration for depth and parallel exploration for breadth. Finally, backtrack-editing consolidates all evidence and produces a coherent, well-documented answer.
  • Figure 3: An example comparing SciRAG with a representative baseline.
  • Figure : (d) ScholarQA-CS
  • Figure : (d) ScholarQA-CS
  • ...and 3 more figures