Ask, Retrieve, Summarize: A Modular Pipeline for Scientific Literature Summarization

Pierre Achkar; Tim Gollub; Martin Potthast

Ask, Retrieve, Summarize: A Modular Pipeline for Scientific Literature Summarization

Pierre Achkar, Tim Gollub, Martin Potthast

TL;DR

Ask, Retrieve, Summarize (XSum) addresses the challenge of rapid scientific literature growth by offering a modular Retrieval-Augmented Generation pipeline for multi-document summarization. It introduces a question-generation module to create targeted retrieval queries from input papers and an editor module to synthesize retrieved content into coherent, citation-rich summaries suitable for academic use. Evaluated on the SurveySum dataset, XSum achieves notable improvements in CheckEval, G-Eval, and Ref-F1 over existing approaches, demonstrating stronger content coverage, coherence, and citation fidelity. The framework emphasizes transparency and adaptability, enabling domain-specific customization and potential extensions to broader domains and modalities. Overall, XSum provides a practical, scalable approach for automated scholarly summarization that aids researchers in overcoming information overload and maintaining up-to-date knowledge across multiple sources.

Abstract

The exponential growth of scientific publications has made it increasingly difficult for researchers to stay updated and synthesize knowledge effectively. This paper presents XSum, a modular pipeline for multi-document summarization (MDS) in the scientific domain using Retrieval-Augmented Generation (RAG). The pipeline includes two core components: a question-generation module and an editor module. The question-generation module dynamically generates questions adapted to the input papers, ensuring the retrieval of relevant and accurate information. The editor module synthesizes the retrieved content into coherent and well-structured summaries that adhere to academic standards for proper citation. Evaluated on the SurveySum dataset, XSum demonstrates strong performance, achieving considerable improvements in metrics such as CheckEval, G-Eval and Ref-F1 compared to existing approaches. This work provides a transparent, adaptable framework for scientific summarization with potential applications in a wide range of domains. Code available at https://github.com/webis-de/scolia25-xsum

Ask, Retrieve, Summarize: A Modular Pipeline for Scientific Literature Summarization

TL;DR

Abstract

Ask, Retrieve, Summarize: A Modular Pipeline for Scientific Literature Summarization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)