Table of Contents
Fetching ...

Model Generation with LLMs: From Requirements to UML Sequence Diagrams

Alessio Ferrari, Sallam Abualhaija, Chetan Arora

TL;DR

This study investigates ChatGPT's ability to generate UML sequence diagrams from natural-language requirements, using 28 industrial documents across multiple domains and 87 variants. Through a structured evaluation of five quality criteria and a thematic analysis of error patterns, the authors find that while diagrams tend to be understandable and conformant to notation, they often omit or misrepresent requirements, especially when inputs are ambiguous or inconsistent. The work highlights a broad spectrum of issues—ranging from completeness and correctness to terminology drift and traceability—underscoring the need for RE-specific prompting, domain context, and human-in-the-loop validation. Practically, the findings suggest augmenting ChatGPT-assisted diagram generation with iterative prompts, context provision, and stakeholder-focused explanations to improve reliability in requirements engineering workflows.

Abstract

Complementing natural language (NL) requirements with graphical models can improve stakeholders' communication and provide directions for system design. However, creating models from requirements involves manual effort. The advent of generative large language models (LLMs), ChatGPT being a notable example, offers promising avenues for automated assistance in model generation. This paper investigates the capability of ChatGPT to generate a specific type of model, i.e., UML sequence diagrams, from NL requirements. We conduct a qualitative study in which we examine the sequence diagrams generated by ChatGPT for 28 requirements documents of various types and from different domains. Observations from the analysis of the generated diagrams have systematically been captured through evaluation logs, and categorized through thematic analysis. Our results indicate that, although the models generally conform to the standard and exhibit a reasonable level of understandability, their completeness and correctness with respect to the specified requirements often present challenges. This issue is particularly pronounced in the presence of requirements smells, such as ambiguity and inconsistency. The insights derived from this study can influence the practical utilization of LLMs in the RE process, and open the door to novel RE-specific prompting strategies targeting effective model generation.

Model Generation with LLMs: From Requirements to UML Sequence Diagrams

TL;DR

This study investigates ChatGPT's ability to generate UML sequence diagrams from natural-language requirements, using 28 industrial documents across multiple domains and 87 variants. Through a structured evaluation of five quality criteria and a thematic analysis of error patterns, the authors find that while diagrams tend to be understandable and conformant to notation, they often omit or misrepresent requirements, especially when inputs are ambiguous or inconsistent. The work highlights a broad spectrum of issues—ranging from completeness and correctness to terminology drift and traceability—underscoring the need for RE-specific prompting, domain context, and human-in-the-loop validation. Practically, the findings suggest augmenting ChatGPT-assisted diagram generation with iterative prompts, context provision, and stakeholder-focused explanations to improve reliability in requirements engineering workflows.

Abstract

Complementing natural language (NL) requirements with graphical models can improve stakeholders' communication and provide directions for system design. However, creating models from requirements involves manual effort. The advent of generative large language models (LLMs), ChatGPT being a notable example, offers promising avenues for automated assistance in model generation. This paper investigates the capability of ChatGPT to generate a specific type of model, i.e., UML sequence diagrams, from NL requirements. We conduct a qualitative study in which we examine the sequence diagrams generated by ChatGPT for 28 requirements documents of various types and from different domains. Observations from the analysis of the generated diagrams have systematically been captured through evaluation logs, and categorized through thematic analysis. Our results indicate that, although the models generally conform to the standard and exhibit a reasonable level of understandability, their completeness and correctness with respect to the specified requirements often present challenges. This issue is particularly pronounced in the presence of requirements smells, such as ambiguity and inconsistency. The insights derived from this study can influence the practical utilization of LLMs in the RE process, and open the door to novel RE-specific prompting strategies targeting effective model generation.
Paper Structure (13 sections, 12 figures, 2 tables)

This paper contains 13 sections, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Example requirements for an "elevator system" and the corresponding sequence diagram.
  • Figure 2: Violin plots for the different evaluation criteria.
  • Figure 3: "Summarization Issues".
  • Figure 4: "Poor Requirements Quality and Model Omissions".
  • Figure 5: "Inconsistency and Model Omissions", "Inconsistency and Model Incorrectness", and "Incoherence Manifestations".
  • ...and 7 more figures