Table of Contents
Fetching ...

On the Use of a Large Language Model to Support the Conduction of a Systematic Mapping Study: A Brief Report from a Practitioner's View

Cauã Ferreira Barros, Marcos Kalinowski, Mohamad Kassab, Valdemar Vicente Graciano Neto

TL;DR

This paper presents an experience report on using large language models to support an end-to-end systematic mapping study in software engineering. It compares manual baseline procedures with LLM-assisted screening and data extraction, reporting substantial time savings and high but not perfect accuracy, with verification to mitigate hallucinations. The study identifies benefits such as standardization and efficiency alongside risks like prompt sensitivity and model hallucination, and offers concrete recommendations for researchers. The work provides an integrated, end-to-end account that helps practitioners adopt LLMs responsibly in SMS and similar evidence-synthesis tasks.

Abstract

The use of Large Language Models (LLMs) has drawn growing interest within the scientific community. LLMs can handle large volumes of textual data and support methods for evidence synthesis. Although recent studies highlight the potential of LLMs to accelerate screening and data extraction steps in systematic reviews, detailed reports of their practical application throughout the entire process remain scarce. This paper presents an experience report on the conduction of a systematic mapping study with the support of LLMs, describing the steps followed, the necessary adjustments, and the main challenges faced. Positive aspects are discussed, such as (i) the significant reduction of time in repetitive tasks and (ii) greater standardization in data extraction, as well as negative aspects, including (i) considerable effort to build reliable well-structured prompts, especially for less experienced users, since achieving effective prompts may require several iterations and testing, which can partially offset the expected time savings, (ii) the occurrence of hallucinations, and (iii) the need for constant manual verification. As a contribution, this work offers lessons learned and practical recommendations for researchers interested in adopting LLMs in systematic mappings and reviews, highlighting both efficiency gains and methodological risks and limitations to be considered.

On the Use of a Large Language Model to Support the Conduction of a Systematic Mapping Study: A Brief Report from a Practitioner's View

TL;DR

This paper presents an experience report on using large language models to support an end-to-end systematic mapping study in software engineering. It compares manual baseline procedures with LLM-assisted screening and data extraction, reporting substantial time savings and high but not perfect accuracy, with verification to mitigate hallucinations. The study identifies benefits such as standardization and efficiency alongside risks like prompt sensitivity and model hallucination, and offers concrete recommendations for researchers. The work provides an integrated, end-to-end account that helps practitioners adopt LLMs responsibly in SMS and similar evidence-synthesis tasks.

Abstract

The use of Large Language Models (LLMs) has drawn growing interest within the scientific community. LLMs can handle large volumes of textual data and support methods for evidence synthesis. Although recent studies highlight the potential of LLMs to accelerate screening and data extraction steps in systematic reviews, detailed reports of their practical application throughout the entire process remain scarce. This paper presents an experience report on the conduction of a systematic mapping study with the support of LLMs, describing the steps followed, the necessary adjustments, and the main challenges faced. Positive aspects are discussed, such as (i) the significant reduction of time in repetitive tasks and (ii) greater standardization in data extraction, as well as negative aspects, including (i) considerable effort to build reliable well-structured prompts, especially for less experienced users, since achieving effective prompts may require several iterations and testing, which can partially offset the expected time savings, (ii) the occurrence of hallucinations, and (iii) the need for constant manual verification. As a contribution, this work offers lessons learned and practical recommendations for researchers interested in adopting LLMs in systematic mappings and reviews, highlighting both efficiency gains and methodological risks and limitations to be considered.
Paper Structure (15 sections, 2 tables)