Table of Contents
Fetching ...

ProfOlaf: Semi-Automated Tool for Systematic Literature Reviews

Martim Afonso, Nuno Saavedra, Bruno Lourenço, Alexandra Mendes, João Ferreira

TL;DR

This paper introduces ProfOlaf, a semi-automated tool designed to streamline systematic literature reviews by merging iterative snowballing for article collection with large language model (LLM)–assisted analysis for topic extraction and querying paper contents. The workflow emphasizes human-in-the-loop screening to maintain methodological rigor while leveraging TopicGPT for topic modeling and a Task Assistant for targeted data extraction and summaries. An illustrative evaluation demonstrates the approach across search/screening and data-extraction tasks, revealing promising efficiency gains but also highlighting limitations in topic modeling accuracy and language tagging that benefit from human oversight. ProfOlaf is open-source and aims to enhance the quality, volume, and reproducibility of systematic reviews across research domains, particularly in software engineering.

Abstract

Systematic reviews and mapping studies are critical for synthesizing research, identifying gaps, and guiding future work, but they are often labor-intensive and time-consuming. Existing tools provide partial support for specific steps, leaving much of the process manual and error-prone. We present ProfOlaf, a semi-automated tool designed to streamline systematic reviews while maintaining methodological rigor. ProfOlaf supports iterative snowballing for article collection with human-in-the-loop filtering and uses large language models to assist in analyzing articles, extracting key topics, and answering queries about the content of papers. By combining automation with guided manual effort, ProfOlaf enhances the efficiency, quality, and reproducibility of systematic reviews across research fields. A video describing and demonstrating ProfOlaf is available at: https://youtu.be/4noUXfcmxsE

ProfOlaf: Semi-Automated Tool for Systematic Literature Reviews

TL;DR

This paper introduces ProfOlaf, a semi-automated tool designed to streamline systematic literature reviews by merging iterative snowballing for article collection with large language model (LLM)–assisted analysis for topic extraction and querying paper contents. The workflow emphasizes human-in-the-loop screening to maintain methodological rigor while leveraging TopicGPT for topic modeling and a Task Assistant for targeted data extraction and summaries. An illustrative evaluation demonstrates the approach across search/screening and data-extraction tasks, revealing promising efficiency gains but also highlighting limitations in topic modeling accuracy and language tagging that benefit from human oversight. ProfOlaf is open-source and aims to enhance the quality, volume, and reproducibility of systematic reviews across research domains, particularly in software engineering.

Abstract

Systematic reviews and mapping studies are critical for synthesizing research, identifying gaps, and guiding future work, but they are often labor-intensive and time-consuming. Existing tools provide partial support for specific steps, leaving much of the process manual and error-prone. We present ProfOlaf, a semi-automated tool designed to streamline systematic reviews while maintaining methodological rigor. ProfOlaf supports iterative snowballing for article collection with human-in-the-loop filtering and uses large language models to assist in analyzing articles, extracting key topics, and answering queries about the content of papers. By combining automation with guided manual effort, ProfOlaf enhances the efficiency, quality, and reproducibility of systematic reviews across research fields. A video describing and demonstrating ProfOlaf is available at: https://youtu.be/4noUXfcmxsE

Paper Structure

This paper contains 16 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Overview of the ProfOlaf methodology. The search phase begins with an initial set of articles. Snowballing is applied, bibliographic information is retrieved, and venues may be optionally ranked. In the screening phase, article metadata is checked. Two or more human raters screen the articles by title and by full paper, with disagreements resolved collaboratively. This cycle continues until no new articles are identified. Duplicates are removed to form the final set of articles. In the data extraction, TopicGPT categorizes content into research topics, and LLM-based question answering is employed to extract structured insights. Manual inspection complements this step, ensuring a consolidated and reliable final set of extracted data.