Replication in Requirements Engineering: the NLP for RE Case
Sallam Abualhaija, F. BaŞAk Aydemir, Fabiano Dalpiaz, Davide Dell'Anna, Alessio Ferrari, Xavier Franch, Davide Fucci
TL;DR
The paper tackles the limited replication support in NLP4RE by introducing the ID-Card, a structured artifact that records replication-relevant information. It grounds the ID-Card in two hands-on replication cases—anaphoric ambiguity detection and FR-NFR classification—through a design-science process and focus groups. The study identifies 16 replication challenges across dataset annotation and tool reconstruction, and demonstrates how the ID-Card can complement primary papers to improve reproducibility. The work aims to raise awareness of replication in NLP4RE and provides a practical tool to facilitate replication, education, and future artifact evaluation.
Abstract
[Context]} Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Despite its empirical vocation, RE research has given limited attention to replication of NLP for RE studies. Replication is hampered by several factors, including the context specificity of the studies, the heterogeneity of the tasks involving NLP, the tasks' inherent hairiness, and, in turn, the heterogeneous reporting structure. [Solution] To address these issues, we propose a new artifact, referred to as ID-Card, whose goal is to provide a structured summary of research papers emphasizing replication-relevant information. We construct the ID-Card through a structured, iterative process based on design science. [Results] In this paper: (i) we report on hands-on experiences of replication, (ii) we review the state-of-the-art and extract replication-relevant information, (iii) we identify, through focus groups, challenges across two typical dimensions of replication: data annotation and tool reconstruction, and (iv) we present the concept and structure of the ID-Card to mitigate the identified challenges. [Contribution] This study aims to create awareness of replication in NLP for RE. We propose an ID-Card that is intended to foster study replication, but can also be used in other contexts, e.g., for educational purposes.
