LLM-Driven Cost-Effective Requirements Change Impact Analysis
Romina Etezadi, Sallam Abualhaija, Chetan Arora, Lionel Briand
TL;DR
ProReFiCIA tackles the expensive problem of change impact analysis in requirements engineering by leveraging prompt-engineered large language models in a two-stage pipeline (robust model–prompt selection; refinement and filtering). It demonstrates high recall (93.3%–95.8%) with minimal analyst review (2.1%–8.5% of requirements) across two datasets, using a training-free approach that does not rely on pre-built training data. The study systematically analyzes 64 prompts across five LLMs, identifies the most robust model–prompt pair (LLaMa with P30), and shows that refinement and filtering further boost effectiveness while keeping costs low. Compared with baselines such as NARCIA and RAG-based methods, ProReFiCIA delivers superior performance with substantially lower review effort, highlighting its practical value for scalable CIA in industry contexts.
Abstract
Requirements are inherently subject to changes throughout the software development lifecycle. Within the limited budget available to requirements engineers, manually identifying the impact of such changes on other requirements is both error-prone and effort-intensive. That might lead to overlooked impacted requirements, which, if not properly managed, can cause serious issues in the downstream tasks. Inspired by the growing potential of large language models (LLMs) across diverse domains, we propose ProReFiCIA, an LLM-driven approach for automatically identifying the impacted requirements when changes occur. We conduct an extensive evaluation of ProReFiCIA using several LLMs and prompts variants tailored to this task. Using the best combination of an LLM and a prompt variant, ProReFiCIA achieves a recall of 93.3% on a benchmark dataset and 95.8% on a newly created industry dataset, demonstrating its strong effectiveness in identifying impacted requirements. Further, the cost of applying ProReFiCIA remains small, as the engineer only needs to review the generated results, which represent between 2.1% and 8.5% of the entire set of requirements.
