Assessing UML Diagrams by GPT: Implications for Education

Chong Wang; Beian Wang; Peng Liang; Jie Liang

Assessing UML Diagrams by GPT: Implications for Education

Chong Wang, Beian Wang, Peng Liang, Jie Liang

TL;DR

This study evaluates the feasibility of using GPT-4o to automatically grade UML diagrams in software modeling education by defining 11 evaluation criteria across use case, class, and sequence diagrams and validating them with 40 students. Using a role-based prompt, GPT-4o grades three diagram types and produces detailed deductions, which are then compared to human expert scores. The findings show GPT can perform automatic assessment and provide personalized feedback, but it generally underperforms relative to human graders, with consistent gaps that vary by diagram type and criteria. The results highlight both the promise and current limitations of AI-assisted grading in SE education, suggesting directions for prompt engineering, criterion refinement, and broader studies to extend AI support to UML modeling tasks and other domains.

Abstract

In software engineering (SE) research and practice, UML is well known as an essential modeling methodology for requirements analysis and software modeling in both academia and industry. In particular, fundamental knowledge of UML modeling and practice in creating high-quality UML diagrams are included in SE-relevant courses in the undergraduate programs of many universities. This leads to a time-consuming and labor-intensive task for educators to review and grade a large number of UML diagrams created by the students. Recent advances in generative AI techniques, such as GPT, have paved new ways to automate many SE tasks. However, current research or tools seldom explore the capabilities of GPT in evaluating the quality of UML diagrams. This paper aims to investigate the feasibility and performance of GPT in assessing the quality of UML use case diagrams, class diagrams, and sequence diagrams. First, 11 evaluation criteria with grading details were proposed for these UML diagrams. Next, a series of experiments was designed and conducted on 40 students' UML modeling reports to explore the performance of GPT in evaluating and grading these UML diagrams. The research findings reveal that GPT can complete this assessment task, but it cannot replace human experts yet. Meanwhile, there are five evaluation discrepancies between GPT and human experts. These discrepancies vary in the use of different evaluation criteria in different types of UML diagrams, presenting GPT's strengths and weaknesses in this automatic evaluation task.

Assessing UML Diagrams by GPT: Implications for Education

TL;DR

Abstract

Assessing UML Diagrams by GPT: Implications for Education

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)