Using LLMs in Software Requirements Specifications: An Empirical Evaluation

Madhava Krishna; Bhagesh Gaur; Arsh Verma; Pankaj Jalote

Using LLMs in Software Requirements Specifications: An Empirical Evaluation

Madhava Krishna, Bhagesh Gaur, Arsh Verma, Pankaj Jalote

TL;DR

The paper investigates whether large language models can effectively draft, validate, and correct software requirements specifications, addressing time and quality concerns in requirements engineering. It empirically compares GPT-4 and CodeLlama against a human benchmark for an SRS of a university club management portal, using a formal, multi-criteria evaluation. Key findings show GPT-4 provides strong validation and guidance, CodeLlama yields more verbose and comprehensive drafts but can incur hallucinations, and both can substantially reduce the time to produce SRS documents, particularly for less experienced engineers. The work demonstrates the practical potential of LLM-assisted requirements engineering while outlining limitations and directions for refining prompting, model selection, and domain specialization.

Abstract

The creation of a Software Requirements Specification (SRS) document is important for any software development project. Given the recent prowess of Large Language Models (LLMs) in answering natural language queries and generating sophisticated textual outputs, our study explores their capability to produce accurate, coherent, and structured drafts of these documents to accelerate the software development lifecycle. We assess the performance of GPT-4 and CodeLlama in drafting an SRS for a university club management system and compare it against human benchmarks using eight distinct criteria. Our results suggest that LLMs can match the output quality of an entry-level software engineer to generate an SRS, delivering complete and consistent drafts. We also evaluate the capabilities of LLMs to identify and rectify problems in a given requirements document. Our experiments indicate that GPT-4 is capable of identifying issues and giving constructive feedback for rectifying them, while CodeLlama's results for validation were not as encouraging. We repeated the generation exercise for four distinct use cases to study the time saved by employing LLMs for SRS generation. The experiment demonstrates that LLMs may facilitate a significant reduction in development time for entry-level software engineers. Hence, we conclude that the LLMs can be gainfully used by software engineers to increase productivity by saving time and effort in generating, validating and rectifying software requirements.

Using LLMs in Software Requirements Specifications: An Empirical Evaluation

TL;DR

Abstract

Paper Structure (17 sections, 1 equation, 4 figures, 7 tables)

This paper contains 17 sections, 1 equation, 4 figures, 7 tables.

Introduction
Related Works
Methodology
Task definition
Benchmark for SRS Generation
Document generation with GPT-4
Document Generation with CodeLlama
Evaluation Strategy for SRS documents
Validation and Correction of Requirements
Quality of Generated SRS Documents
Per-Requirement Evaluation
Validation and Correction of Software Requirements
Validation of Requirements
Correcting Requirements
Impact on Effort
...and 2 more sections

Figures (4)

Figure 1: The format of the SRS used for the study.
Figure 2: Overall SRS evaluation. The graph corresponding to document-wide evaluation parameters and has been obtained by averaging the ratings provided by human graders.
Figure 3: Per-requirement evaluation results for the three SRS documents. Each graph corresponds to each section of the SRS and has been obtained by averaging the ratings provided by human graders for that part of the SRS.
Figure 4: The mean deviations of the LLM-obtained ratings from human ratings averaged for each section.

Using LLMs in Software Requirements Specifications: An Empirical Evaluation

TL;DR

Abstract

Using LLMs in Software Requirements Specifications: An Empirical Evaluation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)