Transfer learning for conflict and duplicate detection in software requirement pairs
Garima Malik, Savas Yildirim, Mucahit Cevik, Ayse Bener, Devang Parikh
TL;DR
This paper tackles automatic detection of conflicting and duplicate software requirements. It introduces SR-BERT, a Sentence-BERT based bi-encoder, combined with sequential transfer learning (e.g., MNLI pretraining followed by CDN fine-tuning) and cross-domain transfer strategies, plus rule-based filtering to refine predictions. The method encodes requirement pairs using the representation $R_1 \oplus R_2 \oplus (R_1 - R_2)$, and is evaluated on a proprietary CDN dataset and four open-source SRS datasets, showing strong performance on larger datasets and promising cross-domain generalization when augmented with information extraction rules. The work provides practical guidance for practitioners and researchers, demonstrating how sequential learning, domain adaptation, and rule-based post-processing can automate conflict and duplicate detection in RE, with dataset resources and thorough statistical validation.
Abstract
Consistent and holistic expression of software requirements is important for the success of software projects. In this study, we aim to enhance the efficiency of the software development processes by automatically identifying conflicting and duplicate software requirement specifications. We formulate the conflict and duplicate detection problem as a requirement pair classification task. We design a novel transformers-based architecture, SR-BERT, which incorporates Sentence-BERT and Bi-encoders for the conflict and duplicate identification task. Furthermore, we apply supervised multi-stage fine-tuning to the pre-trained transformer models. We test the performance of different transfer models using four different datasets. We find that sequentially trained and fine-tuned transformer models perform well across the datasets with SR-BERT achieving the best performance for larger datasets. We also explore the cross-domain performance of conflict detection models and adopt a rule-based filtering approach to validate the model classifications. Our analysis indicates that the sentence pair classification approach and the proposed transformer-based natural language processing strategies can contribute significantly to achieving automation in conflict and duplicate detection
