Table of Contents
Fetching ...

Natural Language Processing for Requirements Traceability

Jin L. C. Guo, Jan-Philipp Steghöfer, Andreas Vogelsang, Jane Cleland-Huang

TL;DR

This chapter surveys natural language processing techniques for requirements traceability, focusing on trace link recovery (TLR) and trace link maintenance (TLM), and extends to trace link explanation and link type prediction. It covers information retrieval, shallow and deep learning, and generative AI (LLMs) approaches, including VSM, LSI, LDA, BERT-based methods, and prompt-based reasoning, to illustrate the evolving toolkit for automated traceability. The authors discuss evaluation practices, dataset limitations, and practical considerations for data collection, metrics, and human-in-the-loop validation, emphasizing the need for richer datasets and explanations. They argue that while ubiquitous traceability remains aspirational, scalable, explainable NLP-enabled solutions and conversational AI interfaces offer promising directions, with ongoing challenges in data, reliability, and integration into early development phases.

Abstract

Traceability, the ability to trace relevant software artifacts to support reasoning about the quality of the software and its development process, plays a crucial role in requirements and software engineering, particularly for safety-critical systems. In this chapter, we provide a comprehensive overview of the representative tasks in requirement traceability for which natural language processing (NLP) and related techniques have made considerable progress in the past decade. We first present the definition of traceability in the context of requirements and the overall engineering process, as well as other important concepts related to traceability tasks. Then, we discuss two tasks in detail, including trace link recovery and trace link maintenance. We also introduce two other related tasks concerning when trace links are used in practical contexts. For each task, we explain the characteristics of the task, how it can be approached through NLP techniques, and how to design and conduct the experiment to demonstrate the performance of the NLP techniques. We further discuss practical considerations on how to effectively apply NLP techniques and assess their effectiveness regarding the data set collection, the metrics selection, and the role of humans when evaluating the NLP approaches. Overall, this chapter prepares the readers with the fundamental knowledge of designing automated traceability solutions enabled by NLP in practice.

Natural Language Processing for Requirements Traceability

TL;DR

This chapter surveys natural language processing techniques for requirements traceability, focusing on trace link recovery (TLR) and trace link maintenance (TLM), and extends to trace link explanation and link type prediction. It covers information retrieval, shallow and deep learning, and generative AI (LLMs) approaches, including VSM, LSI, LDA, BERT-based methods, and prompt-based reasoning, to illustrate the evolving toolkit for automated traceability. The authors discuss evaluation practices, dataset limitations, and practical considerations for data collection, metrics, and human-in-the-loop validation, emphasizing the need for richer datasets and explanations. They argue that while ubiquitous traceability remains aspirational, scalable, explainable NLP-enabled solutions and conversational AI interfaces offer promising directions, with ongoing challenges in data, reliability, and integration into early development phases.

Abstract

Traceability, the ability to trace relevant software artifacts to support reasoning about the quality of the software and its development process, plays a crucial role in requirements and software engineering, particularly for safety-critical systems. In this chapter, we provide a comprehensive overview of the representative tasks in requirement traceability for which natural language processing (NLP) and related techniques have made considerable progress in the past decade. We first present the definition of traceability in the context of requirements and the overall engineering process, as well as other important concepts related to traceability tasks. Then, we discuss two tasks in detail, including trace link recovery and trace link maintenance. We also introduce two other related tasks concerning when trace links are used in practical contexts. For each task, we explain the characteristics of the task, how it can be approached through NLP techniques, and how to design and conduct the experiment to demonstrate the performance of the NLP techniques. We further discuss practical considerations on how to effectively apply NLP techniques and assess their effectiveness regarding the data set collection, the metrics selection, and the role of humans when evaluating the NLP approaches. Overall, this chapter prepares the readers with the fundamental knowledge of designing automated traceability solutions enabled by NLP in practice.
Paper Structure (32 sections, 2 figures, 1 table)

This paper contains 32 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Requirements Traceability includes four fundamental activities of planning and managing the traceability strategy, creating links, maintaining links, and ultimately using links. NLP is particularly useful for supporting trace link creation and evolution.
  • Figure 2: Schematic pipeline for an ML classifier for TLR.