Table of Contents
Fetching ...

Practical Guidelines for the Selection and Evaluation of Natural Language Processing Techniques in Requirements Engineering

Mehrdad Sabetzadeh, Chetan Arora

TL;DR

The chapter addresses the challenge of selecting effective NLP techniques for Requirements Engineering (RE) amid rapid NLP advances. It presents a three-stage automation framework (pre-processing, analysis, post-processing) and a pragmatic decision process for choosing among traditional NLP, feature-based ML, and large language models (LLMs), guided by data availability, category definitions, and resource constraints. It details enabling techniques (NLP pipelines, relevance measures, embeddings, pattern matching, ML, clustering, and LLMs) and offers concrete guidelines for evaluation, data handling, and prompting, emphasizing empirical validation and cost considerations. The work aims to accelerate entry into NLP4RE for newcomers and to serve as a living, adaptable reference as technologies evolve. Its practical impact lies in providing structured methods and evaluation guidance to improve the reliability and usefulness of automated RE tasks in industry.

Abstract

Natural Language Processing (NLP) is now a cornerstone of requirements automation. One compelling factor behind the growing adoption of NLP in Requirements Engineering (RE) is the prevalent use of natural language (NL) for specifying requirements in industry. NLP techniques are commonly used for automatically classifying requirements, extracting important information, e.g., domain models and glossary terms, and performing quality assurance tasks, such as ambiguity handling and completeness checking. With so many different NLP solution strategies available and the possibility of applying machine learning alongside, it can be challenging to choose the right strategy for a specific RE task and to evaluate the resulting solution in an empirically rigorous manner. In this chapter, we present guidelines for the selection of NLP techniques as well as for their evaluation in the context of RE. In particular, we discuss how to choose among different strategies such as traditional NLP, feature-based machine learning, and language-model-based methods. Our ultimate hope for this chapter is to serve as a stepping stone, assisting newcomers to NLP4RE in quickly initiating themselves into the NLP technologies most pertinent to the RE field.

Practical Guidelines for the Selection and Evaluation of Natural Language Processing Techniques in Requirements Engineering

TL;DR

The chapter addresses the challenge of selecting effective NLP techniques for Requirements Engineering (RE) amid rapid NLP advances. It presents a three-stage automation framework (pre-processing, analysis, post-processing) and a pragmatic decision process for choosing among traditional NLP, feature-based ML, and large language models (LLMs), guided by data availability, category definitions, and resource constraints. It details enabling techniques (NLP pipelines, relevance measures, embeddings, pattern matching, ML, clustering, and LLMs) and offers concrete guidelines for evaluation, data handling, and prompting, emphasizing empirical validation and cost considerations. The work aims to accelerate entry into NLP4RE for newcomers and to serve as a living, adaptable reference as technologies evolve. Its practical impact lies in providing structured methods and evaluation guidance to improve the reliability and usefulness of automated RE tasks in industry.

Abstract

Natural Language Processing (NLP) is now a cornerstone of requirements automation. One compelling factor behind the growing adoption of NLP in Requirements Engineering (RE) is the prevalent use of natural language (NL) for specifying requirements in industry. NLP techniques are commonly used for automatically classifying requirements, extracting important information, e.g., domain models and glossary terms, and performing quality assurance tasks, such as ambiguity handling and completeness checking. With so many different NLP solution strategies available and the possibility of applying machine learning alongside, it can be challenging to choose the right strategy for a specific RE task and to evaluate the resulting solution in an empirically rigorous manner. In this chapter, we present guidelines for the selection of NLP techniques as well as for their evaluation in the context of RE. In particular, we discuss how to choose among different strategies such as traditional NLP, feature-based machine learning, and language-model-based methods. Our ultimate hope for this chapter is to serve as a stepping stone, assisting newcomers to NLP4RE in quickly initiating themselves into the NLP technologies most pertinent to the RE field.
Paper Structure (16 sections, 4 figures)

This paper contains 16 sections, 4 figures.

Figures (4)

  • Figure 1: NLP4RE Steps
  • Figure 2: Identifying Suitable Enabling Technique(s) for a Specific Analysis Task
  • Figure 3: Illustration of (a) Constituency Parsing and (b) Dependency Parsing. Both Parsing Methods Require Sentence Detection and POS Tagging.
  • Figure 4: Illustration of Query-based Pattern Matching over the Annotations of Fig. \ref{['fig:nlp']}.