Table of Contents
Fetching ...

Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models

Averi Bates, Ryan Vavricka, Shane Carleton, Ruosi Shao, Chongle Pan

TL;DR

This work tackles the challenge of generating executable UML code from diagram images by leveraging a multimodal large language model (MM-LLM). It implements a framework around LLaVA-1.5, comparing standard fine-tuning and LoRA across $7\mathrm{B}$ and $13\mathrm{B}$ sizes, trained on a large synthetic PlantUML dataset of activity and sequence diagrams, with a modest real-world test set for generalization. The results show that larger models and datasets improve $BLEU$ and $SSIM$, with $13\mathrm{B}$ LoRA achieving top scores on sequence diagrams, while real-world generalization remains challenging due to domain gaps. The findings highlight the practicality of domain-adapted MM-LLMs for UML-to-code automation and point to future work in more realistic data generation and advanced evaluation metrics to bridge the realism gap in real-world applications.

Abstract

The Unified Modeling Language is a standardized visual language widely used for modeling and documenting the design of software systems. Although many tools generate UML diagrams from UML code, generating executable UML code from image-based UML diagrams remains challenging. This paper proposes a new approach to generate UML code using a large multimodal language model automatically. Synthetic UML activity and sequence diagram datasets were created to train and test the model. We compared standard fine-tuning with LoRA techniques to optimize base models. The experiments measured code generation accuracy across different model sizes and training strategies. These results demonstrated that domain-adapted MM-LLMs perform for UML code generation automation, whereby, at the best model, it achieved BLEU and SSIM scores of 0.779 and 0.942 on sequence diagrams. This will enable the modernization of legacy systems and decrease the manual effort in software development workflows.

Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models

TL;DR

This work tackles the challenge of generating executable UML code from diagram images by leveraging a multimodal large language model (MM-LLM). It implements a framework around LLaVA-1.5, comparing standard fine-tuning and LoRA across and sizes, trained on a large synthetic PlantUML dataset of activity and sequence diagrams, with a modest real-world test set for generalization. The results show that larger models and datasets improve and , with LoRA achieving top scores on sequence diagrams, while real-world generalization remains challenging due to domain gaps. The findings highlight the practicality of domain-adapted MM-LLMs for UML-to-code automation and point to future work in more realistic data generation and advanced evaluation metrics to bridge the realism gap in real-world applications.

Abstract

The Unified Modeling Language is a standardized visual language widely used for modeling and documenting the design of software systems. Although many tools generate UML diagrams from UML code, generating executable UML code from image-based UML diagrams remains challenging. This paper proposes a new approach to generate UML code using a large multimodal language model automatically. Synthetic UML activity and sequence diagram datasets were created to train and test the model. We compared standard fine-tuning with LoRA techniques to optimize base models. The experiments measured code generation accuracy across different model sizes and training strategies. These results demonstrated that domain-adapted MM-LLMs perform for UML code generation automation, whereby, at the best model, it achieved BLEU and SSIM scores of 0.779 and 0.942 on sequence diagrams. This will enable the modernization of legacy systems and decrease the manual effort in software development workflows.

Paper Structure

This paper contains 25 sections, 30 figures, 8 tables.

Figures (30)

  • Figure 1: Activity diagram showing the decision paths for valid and invalid requests.
  • Figure 2: PlantUML code that represents the activity diagram structure, including the decision points and action paths.
  • Figure 3: Sequence diagram representing validating user information and granting access.
  • Figure 4: PlantUML code that defines the interactions for the user login process shown in the sequence diagram.
  • Figure 5: Sequence diagram depicting participants, messages, and activation events.
  • ...and 25 more figures