Table of Contents
Fetching ...

Multi-modal Summarization in Model-Based Engineering: Automotive Software Development Case Study

Nenad Petrovic, Yurui Zhang, Moaad Maaroufi, Kuo-Yi Chao, Lukasz Mazur, Fengjunjie Pan, Vahid Zolfaghari, Alois Knoll

TL;DR

This paper addresses the challenge of multimodal MBSE diagram understanding in automotive software by evaluating a range of multimodal large language models on UML/EMF diagrams. It conducts a focused automotive experiment using five diagram-based questions to compare model capabilities, highlighting strong performance on simple content but limitations on attributes, functions, and diagram diffs. A proof-of-concept web service leveraging InternVL2-8B-MPO demonstrates practical deployment and workflow integration with CARLA-based code generation and ISO/OCL checks. The findings illuminate both the promise and current limitations of multimodal summarization for automotive MBSE, and they outline concrete directions for future benchmarking, dataset development, and scalable tooling. The work provides a concrete path toward automated, in-domain diagram summarization that can support maintainability and updates in automotive engineering.

Abstract

Multimodal summarization integrating information from diverse data modalities presents a promising solution to aid the understanding of information within various processes. However, the application and advantages of multimodal summarization have not received much attention in model-based engineering (MBE), where it has become a cornerstone in the design and development of complex systems, leveraging formal models to improve understanding, validation and automation throughout the engineering lifecycle. UML and EMF diagrams in model-based engineering contain a large amount of multimodal information and intricate relational data. Hence, our study explores the application of multimodal large language models within the domain of model-based engineering to evaluate their capacity for understanding and identifying relationships, features, and functionalities embedded in UML and EMF diagrams. We aim to demonstrate the transformative potential benefits and limitations of multimodal summarization in improving productivity and accuracy in MBE practices. The proposed approach is evaluated within the context of automotive software development, while many promising state-of-art models were taken into account.

Multi-modal Summarization in Model-Based Engineering: Automotive Software Development Case Study

TL;DR

This paper addresses the challenge of multimodal MBSE diagram understanding in automotive software by evaluating a range of multimodal large language models on UML/EMF diagrams. It conducts a focused automotive experiment using five diagram-based questions to compare model capabilities, highlighting strong performance on simple content but limitations on attributes, functions, and diagram diffs. A proof-of-concept web service leveraging InternVL2-8B-MPO demonstrates practical deployment and workflow integration with CARLA-based code generation and ISO/OCL checks. The findings illuminate both the promise and current limitations of multimodal summarization for automotive MBSE, and they outline concrete directions for future benchmarking, dataset development, and scalable tooling. The work provides a concrete path toward automated, in-domain diagram summarization that can support maintainability and updates in automotive engineering.

Abstract

Multimodal summarization integrating information from diverse data modalities presents a promising solution to aid the understanding of information within various processes. However, the application and advantages of multimodal summarization have not received much attention in model-based engineering (MBE), where it has become a cornerstone in the design and development of complex systems, leveraging formal models to improve understanding, validation and automation throughout the engineering lifecycle. UML and EMF diagrams in model-based engineering contain a large amount of multimodal information and intricate relational data. Hence, our study explores the application of multimodal large language models within the domain of model-based engineering to evaluate their capacity for understanding and identifying relationships, features, and functionalities embedded in UML and EMF diagrams. We aim to demonstrate the transformative potential benefits and limitations of multimodal summarization in improving productivity and accuracy in MBE practices. The proposed approach is evaluated within the context of automotive software development, while many promising state-of-art models were taken into account.

Paper Structure

This paper contains 11 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: MLLM-based diagram prompting for product updates identification.
  • Figure 2: The workflow of MLLM-based image prompting for automotive scenarios.