Table of Contents
Fetching ...

AISysRev -- LLM-based Tool for Title-abstract Screening

Aleksi Huotala, Miikka Kuutila, Olli-Pekka Turtio, Mika Mäntylä

TL;DR

This work tackles the bottleneck of title-abstract screening in software engineering systematic reviews by introducing AiSysRev, a Dockerized web tool that uses OpenRouter to query multiple LLMs and supports zero-shot and few-shot prompting along with manual review interfaces. The tool ingests a CSV of paper titles/abstracts, applies inclusion/exclusion criteria, and exports structured results for downstream analysis, enabling a human-in-the-loop workflow. In a trial with 137 papers, authors categorize screening outcomes into Easy Includes, Boundary Includes, Boundary Excludes, and Clear Excludes, underscoring that boundary cases still require human judgment. Overall, AiSysRev offers a practical, extensible platform to speed up systematic reviews while preserving necessary expert oversight, with open-source artifacts and plans for feature expansion such as full-text screening and visualization of decision boundaries.

Abstract

Systematic reviews are a standard practice for summarizing the state of evidence in software engineering. Conducting systematic reviews is laborious, especially during the screening or study selection phase, where the number of papers can be overwhelming. During this phase, papers are assessed against inclusion and exclusion criteria based on their titles and abstracts. Recent research has demonstrated that large language models (LLMs) can perform title-abstract screening at a level comparable to that of a master's student. While LLMs cannot be fully trusted, they can help, for example, in Rapid Reviews, which try to expedite the review process. Building on recent research, we developed AiSysRev, an LLM-based screening tool implemented as a web application running in a Docker container. The tool accepts a CSV file containing paper titles and abstracts. Users specify inclusion and exclusion criteria. One can use multiple LLMs for screening via OpenRouter. AiSysRev supports both zero-shot and few-shot screening, and also allows for manual screening through interfaces that display LLM results as guidance for human reviewers.We conducted a trial study with 137 papers using the tool. Our findings indicate that papers can be classified into four categories: Easy Includes, Easy Excludes, Boundary Includes, and Boundary Excludes. The Boundary cases, where LLMs are prone to errors, highlight the need for human intervention. While LLMs do not replace human judgment in systematic reviews, they can significantly reduce the burden of assessing large volumes of scientific literature. Video: https://www.youtube.com/watch?v=jVbEj4Y4tQI Tool: https://github.com/EvoTestOps/AISysRev

AISysRev -- LLM-based Tool for Title-abstract Screening

TL;DR

This work tackles the bottleneck of title-abstract screening in software engineering systematic reviews by introducing AiSysRev, a Dockerized web tool that uses OpenRouter to query multiple LLMs and supports zero-shot and few-shot prompting along with manual review interfaces. The tool ingests a CSV of paper titles/abstracts, applies inclusion/exclusion criteria, and exports structured results for downstream analysis, enabling a human-in-the-loop workflow. In a trial with 137 papers, authors categorize screening outcomes into Easy Includes, Boundary Includes, Boundary Excludes, and Clear Excludes, underscoring that boundary cases still require human judgment. Overall, AiSysRev offers a practical, extensible platform to speed up systematic reviews while preserving necessary expert oversight, with open-source artifacts and plans for feature expansion such as full-text screening and visualization of decision boundaries.

Abstract

Systematic reviews are a standard practice for summarizing the state of evidence in software engineering. Conducting systematic reviews is laborious, especially during the screening or study selection phase, where the number of papers can be overwhelming. During this phase, papers are assessed against inclusion and exclusion criteria based on their titles and abstracts. Recent research has demonstrated that large language models (LLMs) can perform title-abstract screening at a level comparable to that of a master's student. While LLMs cannot be fully trusted, they can help, for example, in Rapid Reviews, which try to expedite the review process. Building on recent research, we developed AiSysRev, an LLM-based screening tool implemented as a web application running in a Docker container. The tool accepts a CSV file containing paper titles and abstracts. Users specify inclusion and exclusion criteria. One can use multiple LLMs for screening via OpenRouter. AiSysRev supports both zero-shot and few-shot screening, and also allows for manual screening through interfaces that display LLM results as guidance for human reviewers.We conducted a trial study with 137 papers using the tool. Our findings indicate that papers can be classified into four categories: Easy Includes, Easy Excludes, Boundary Includes, and Boundary Excludes. The Boundary cases, where LLMs are prone to errors, highlight the need for human intervention. While LLMs do not replace human judgment in systematic reviews, they can significantly reduce the burden of assessing large volumes of scientific literature. Video: https://www.youtube.com/watch?v=jVbEj4Y4tQI Tool: https://github.com/EvoTestOps/AISysRev

Paper Structure

This paper contains 11 sections, 1 figure.

Figures (1)

  • Figure 1: The AiSysRev tool architecture.