Can ChatGPT Generate Realistic Synthetic System Requirement Specifications? Results of a Case Study

Alex R. Mattukat; Florian M. Braun; Horst Lichter

Can ChatGPT Generate Realistic Synthetic System Requirement Specifications? Results of a Case Study

Alex R. Mattukat, Florian M. Braun, Horst Lichter

TL;DR

Overall, the study was able to generate realistic SSyRSs to a certain extent with ChatGPT, but LLM-based quality assessments cannot fully replace thorough expert evaluations.

Abstract

System requirement specifications (SyRSs) are central, natural-language (NL) artifacts. Access to real SyRS for research purposes is highly valuable but limited by proprietary restrictions or confidentiality concerns. Generating synthetic SyRSs (SSyRSs) can address this scarcity. Black-box large language models (LLMs) such as ChatGPT offer compelling generation capabilities by providing easy access to NL generation functions without requiring access to real data. However, LLMs suffer from hallucinations and overconfidence, which pose major challenges in their use. We designed an exploratory study to investigate whether, despite these challenges, we can generate realistic SSyRSs with ChatGPT without having access to real SyRSs. Using a systematic approach that leverages prompt patterns, LLM-based quality assessments, and iterative prompt refinements, we generated 300 SSyRSs across 10 industries with ChatGPT. The results were evaluated using cross-model checks and an expert study, with n=87 submitted surveys. 62\% of experts considered the SSyRSs to be realistic. However, in-depth examination revealed contradictory statements and deficiencies. Overall, we were able to generate realistic SSyRSs to a certain extent with ChatGPT, but LLM-based quality assessments cannot fully replace thorough expert evaluations. This paper presents the methodology and results of our study and discusses the key insights we obtained.

Can ChatGPT Generate Realistic Synthetic System Requirement Specifications? Results of a Case Study

TL;DR

Overall, the study was able to generate realistic SSyRSs to a certain extent with ChatGPT, but LLM-based quality assessments cannot fully replace thorough expert evaluations.

Abstract

Paper Structure (27 sections, 3 figures, 7 tables)

This paper contains 27 sections, 3 figures, 7 tables.

INTRODUCTION
RESEARCH DESIGN
LLM-based Generation and Assessments of SSyRSs
Phase 1 - Preparation
Selected Domains:
SSyRS template:
Quality Properties:
Generation Prompt:
Quality Assessments:
Phase 2 - Generation and Assessment:
Phase 3 - Analysis and Refinement:
Process Execution
RESULTS
Size Statistics
Quality Statistics
...and 12 more sections

Figures (3)

Figure 1: The SSyRS generation process (colors indicate loops, italic comments describe loop conditions).
Figure 2: Excerpt of the logistics SSyRS "Dynamic Freight Optimization Platform (DFOP)". The whole SSyRS can be found in our GitHub repository.
Figure 3: Distribution of the overall rating of the degree of realism of the SSyRSs.

Can ChatGPT Generate Realistic Synthetic System Requirement Specifications? Results of a Case Study

TL;DR

Abstract

Can ChatGPT Generate Realistic Synthetic System Requirement Specifications? Results of a Case Study

Authors

TL;DR

Abstract

Table of Contents

Figures (3)