Leveraging LLMs for Semi-Automatic Corpus Filtration in Systematic Literature Reviews

Lucas Joos; Daniel A. Keim; Maximilian T. Fischer

Leveraging LLMs for Semi-Automatic Corpus Filtration in Systematic Literature Reviews

Lucas Joos, Daniel A. Keim, Maximilian T. Fischer

TL;DR

This paper presents LLMSurver, an open-source, human-in-the-loop pipeline for semi-automatic corpus filtration in systematic literature reviews. By ensembling multiple LLMs and applying a consensus scheme, the approach substantially reduces manual screening time while preserving high recall, demonstrated on a large 8.3k-paper corpus retrieved for a real SLR. Across mid-2024 and fall-2025 model cohorts, open-source models increasingly matched or exceeded commercial performance, with careful prompt design and interactive supervision enabling robust, transparent decision-making. The work advances responsible AI-assisted research workflows, emphasizing privacy, reproducibility, and practical integration into academic practice.

Abstract

The creation of systematic literature reviews (SLR) is critical for analyzing the landscape of a research field and guiding future research directions. However, retrieving and filtering the literature corpus for an SLR is highly time-consuming and requires extensive manual effort, as keyword-based searches in digital libraries often return numerous irrelevant publications. In this work, we propose a pipeline leveraging multiple large language models (LLMs), classifying papers based on descriptive prompts and deciding jointly using a consensus scheme. The entire process is human-supervised and interactively controlled via our open-source visual analytics web interface, LLMSurver, which enables real-time inspection and modification of model outputs. We evaluate our approach using ground-truth data from a recent SLR comprising over 8,000 candidate papers, benchmarking both open and commercial state-of-the-art LLMs from mid-2024 and fall 2025. Results demonstrate that our pipeline significantly reduces manual effort while achieving lower error rates than single human annotators. Furthermore, modern open-source models prove sufficient for this task, making the method accessible and cost-effective. Overall, our work demonstrates how responsible human-AI collaboration can accelerate and enhance systematic literature reviews within academic workflows.

Leveraging LLMs for Semi-Automatic Corpus Filtration in Systematic Literature Reviews

TL;DR

Abstract

Leveraging LLMs for Semi-Automatic Corpus Filtration in Systematic Literature Reviews

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)