RECOVER: Toward Requirements Generation from Stakeholders' Conversations
Gianmario Voria, Francesco Casillo, Carmine Gravino, Gemma Catolino, Fabio Palomba
TL;DR
RECOVER presents a three-step pipeline to extract and generate system requirements from stakeholder conversations, combining ML-based classification, context-preserving processing, and LLM-driven generation. The approach prioritizes recall at the turn level and uses a structured evaluation with expert oracles and a ChatGPT baseline to demonstrate added value over generic LLM prompting. Across turn-level and whole-conversation analyses, RECOVER shows promising accuracy, completeness, and actionability, with in-vivo validation indicating robustness in noisy industrial settings, albeit with necessary human oversight to mitigate hallucinations and ensure traceability. The work advances conversational requirements engineering by enabling automated yet human-validated elicitation, and it provides replication materials and a path for extending to non-functional requirements and traceability features.
Abstract
Stakeholders' conversations in requirements elicitation meetings hold valuable insights into system and client needs. However, manually extracting requirements is time-consuming, labor-intensive, and prone to errors and biases. While current state-of-the-art methods assist in summarizing stakeholder conversations and classifying requirements based on their nature, there is a noticeable lack of approaches capable of both identifying requirements within these conversations and generating corresponding system requirements. These approaches would assist requirement identification, reducing engineers' workload, time, and effort. To address this gap, this paper introduces RECOVER (Requirements EliCitation frOm conVERsations), a novel conversational requirements engineering approach that leverages natural language processing and large language models (LLMs) to support practitioners in automatically extracting system requirements from stakeholder interactions. The approach is evaluated using a mixed-method study that combines performance analysis with a user study involving requirements engineers, targeting two levels of granularity. First, at the conversation turn level, the evaluation measures RECOVER's accuracy in identifying requirements-relevant dialogue and the quality of generated requirements in terms of correctness, completeness, and actionability. Second, at the entire conversation level, the evaluation assesses the overall usefulness and effectiveness of RECOVER in synthesizing comprehensive system requirements from full stakeholder discussions. Empirical evaluation of RECOVER shows promising performance, with generated requirements demonstrating satisfactory correctness, completeness, and actionability. The results also highlight the potential of automating requirements elicitation from conversations as an aid that enhances efficiency while maintaining human oversight
