Table of Contents
Fetching ...

LiRA: A Multi-Agent Framework for Reliable and Readable Literature Review Generation

Gregory Hok Tjoan Go, Khang Ly, Anders Søgaard, Amin Tabatabaei, Maarten de Rijke, Xinyi Chen

TL;DR

The paper addresses the growing challenge of producing up-to-date, readable, and factually accurate literature reviews. It introduces LiRA, a multi-agent, agentic-LLM workflow with specialized roles for outlining, subsection writing, editing, reviewing, and citation grounding, designed to operate out-of-the-box without task-specific fine-tuning. Empirical results on SciReviewGen and ScienceDirect show that LiRA outperforms Open-Source baselines in writing quality and citation reliability while maintaining competitiveness in content similarity to human reviews, even under retrieval-based deployment and varying reviewer models. The work demonstrates the practical potential of agentic LLM workflows for automated scientific writing and highlights pathways for real-world adoption and future enhancements in end-to-end review pipelines.

Abstract

The rapid growth of scientific publications has made it increasingly difficult to keep literature reviews comprehensive and up-to-date. Though prior work has focused on automating retrieval and screening, the writing phase of systematic reviews remains largely under-explored, especially with regard to readability and factual accuracy. To address this, we present LiRA (Literature Review Agents), a multi-agent collaborative workflow which emulates the human literature review process. LiRA utilizes specialized agents for content outlining, subsection writing, editing, and reviewing, producing cohesive and comprehensive review articles. Evaluated on SciReviewGen and a proprietary ScienceDirect dataset, LiRA outperforms current baselines such as AutoSurvey and MASS-Survey in writing and citation quality, while maintaining competitive similarity to human-written reviews. We further evaluate LiRA in real-world scenarios using document retrieval and assess its robustness to reviewer model variation. Our findings highlight the potential of agentic LLM workflows, even without domain-specific tuning, to improve the reliability and usability of automated scientific writing.

LiRA: A Multi-Agent Framework for Reliable and Readable Literature Review Generation

TL;DR

The paper addresses the growing challenge of producing up-to-date, readable, and factually accurate literature reviews. It introduces LiRA, a multi-agent, agentic-LLM workflow with specialized roles for outlining, subsection writing, editing, reviewing, and citation grounding, designed to operate out-of-the-box without task-specific fine-tuning. Empirical results on SciReviewGen and ScienceDirect show that LiRA outperforms Open-Source baselines in writing quality and citation reliability while maintaining competitiveness in content similarity to human reviews, even under retrieval-based deployment and varying reviewer models. The work demonstrates the practical potential of agentic LLM workflows for automated scientific writing and highlights pathways for real-world adoption and future enhancements in end-to-end review pipelines.

Abstract

The rapid growth of scientific publications has made it increasingly difficult to keep literature reviews comprehensive and up-to-date. Though prior work has focused on automating retrieval and screening, the writing phase of systematic reviews remains largely under-explored, especially with regard to readability and factual accuracy. To address this, we present LiRA (Literature Review Agents), a multi-agent collaborative workflow which emulates the human literature review process. LiRA utilizes specialized agents for content outlining, subsection writing, editing, and reviewing, producing cohesive and comprehensive review articles. Evaluated on SciReviewGen and a proprietary ScienceDirect dataset, LiRA outperforms current baselines such as AutoSurvey and MASS-Survey in writing and citation quality, while maintaining competitive similarity to human-written reviews. We further evaluate LiRA in real-world scenarios using document retrieval and assess its robustness to reviewer model variation. Our findings highlight the potential of agentic LLM workflows, even without domain-specific tuning, to improve the reliability and usability of automated scientific writing.

Paper Structure

This paper contains 40 sections, 2 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: An overview of the LiRA architecture. The narrow dotted arrows represent document input/output, the wide dotted arrows indicate the refinement process, and the filled-in arrows signify the system's main flow. Each agent is explained in the below sections.
  • Figure 2: SME evaluation results. Here, C indicates Coverage, S indicates Structure, and R indicates Relevance.
  • Figure 3: Annotation sample for the SciReviewGen dataset.
  • Figure 4: Textual similarity and citation quality results for the different reviewer model setting.
  • Figure 5: Prometheus evaluation results for the different reviewer model setting.