Table of Contents
Fetching ...

Lessons from the Use of Natural Language Inference (NLI) in Requirements Engineering Tasks

Mohamad Fazelnia, Viktoria Koscinski, Spencer Herzog, Mehdi Mirakhorli

TL;DR

This work investigates applying natural language inference (NLI) to automate requirements engineering tasks. By reframing three RE tasks—requirements classification and characterization, defect detection, and conflict detection—as entailment problems, and enhancing them with label verbalization and domain knowledge, the authors build a RoBERTa-based NLI pipeline that outperforms prompt-based, transfer-learning, probabilistic, and some LLM baselines, especially in Tasks 1 and 2. They also introduce two new datasets for defects and conflicts, and provide zero-shot cross-project evidence of NLI's robustness, while acknowledging limitations in capturing compositional conflicts that involve interactions among multiple requirements. The study yields practical guidance on data preparation, verbalization, and prompt design to maximize NLI's effectiveness in RE, offering a data-efficient approach with potential for industry adoption and further research into multi-premise reasoning. Overall, the results support NLI as a competitive, scalable method for automating core RE tasks, with significant implications for reducing manual effort and improving requirement quality.

Abstract

We investigate the use of Natural Language Inference (NLI) in automating requirements engineering tasks. In particular, we focus on three tasks: requirements classification, identification of requirements specification defects, and detection of conflicts in stakeholders' requirements. While previous research has demonstrated significant benefit in using NLI as a universal method for a broad spectrum of natural language processing tasks, these advantages have not been investigated within the context of software requirements engineering. Therefore, we design experiments to evaluate the use of NLI in requirements analysis. We compare the performance of NLI with a spectrum of approaches, including prompt-based models, conventional transfer learning, Large Language Models (LLMs)-powered chatbot models, and probabilistic models. Through experiments conducted under various learning settings including conventional learning and zero-shot, we demonstrate conclusively that our NLI method surpasses classical NLP methods as well as other LLMs-based and chatbot models in the analysis of requirements specifications. Additionally, we share lessons learned characterizing the learning settings that make NLI a suitable approach for automating requirements engineering tasks.

Lessons from the Use of Natural Language Inference (NLI) in Requirements Engineering Tasks

TL;DR

This work investigates applying natural language inference (NLI) to automate requirements engineering tasks. By reframing three RE tasks—requirements classification and characterization, defect detection, and conflict detection—as entailment problems, and enhancing them with label verbalization and domain knowledge, the authors build a RoBERTa-based NLI pipeline that outperforms prompt-based, transfer-learning, probabilistic, and some LLM baselines, especially in Tasks 1 and 2. They also introduce two new datasets for defects and conflicts, and provide zero-shot cross-project evidence of NLI's robustness, while acknowledging limitations in capturing compositional conflicts that involve interactions among multiple requirements. The study yields practical guidance on data preparation, verbalization, and prompt design to maximize NLI's effectiveness in RE, offering a data-efficient approach with potential for industry adoption and further research into multi-premise reasoning. Overall, the results support NLI as a competitive, scalable method for automating core RE tasks, with significant implications for reducing manual effort and improving requirement quality.

Abstract

We investigate the use of Natural Language Inference (NLI) in automating requirements engineering tasks. In particular, we focus on three tasks: requirements classification, identification of requirements specification defects, and detection of conflicts in stakeholders' requirements. While previous research has demonstrated significant benefit in using NLI as a universal method for a broad spectrum of natural language processing tasks, these advantages have not been investigated within the context of software requirements engineering. Therefore, we design experiments to evaluate the use of NLI in requirements analysis. We compare the performance of NLI with a spectrum of approaches, including prompt-based models, conventional transfer learning, Large Language Models (LLMs)-powered chatbot models, and probabilistic models. Through experiments conducted under various learning settings including conventional learning and zero-shot, we demonstrate conclusively that our NLI method surpasses classical NLP methods as well as other LLMs-based and chatbot models in the analysis of requirements specifications. Additionally, we share lessons learned characterizing the learning settings that make NLI a suitable approach for automating requirements engineering tasks.
Paper Structure (16 sections, 2 equations, 5 tables, 1 algorithm)