Table of Contents
Fetching ...

LLM-enabled Instance Model Generation

Fengjunjie Pan, Nenad Petrovic, Vahid Zolfaghari, Long Wen, Alois Knoll

TL;DR

This work tackles the challenge of automatically generating valid XMI-based instance models from Ecore metamodels and natural language specifications using large language models. It introduces a two-step workflow: first, a language model produces a conceptual, format-independent Conceptual Instance Model, and second, a dedicated instance compiler converts this intermediate representation into a syntactically valid XMI file via PyEcore. The approach significantly improves the validity and usefulness of generated instance models across both commercial and open-source LLMs, with smaller models like Llama 3.1-70B achieving semantic recall comparable to larger models in this framework. The method enables reliable integration of LLMs into model-based engineering workflows and paves the way for extensions such as retrieval-augmented generation and iterative generation strategies. Overall, the paper demonstrates a practical path to scalable, semantically accurate instance model generation within MOF/EMF ecosystems.

Abstract

In the domain of model-based engineering, models are essential components that enable system design and analysis. Traditionally, the creation of these models has been a manual process requiring not only deep modeling expertise but also substantial domain knowledge of target systems. With the rapid advancement of generative artificial intelligence, large language models (LLMs) show potential for automating model generation. This work explores the generation of instance models using LLMs, focusing specifically on producing XMI-based instance models from Ecore metamodels and natural language specifications. We observe that current LLMs struggle to directly generate valid XMI models. To address this, we propose a two-step approach: first, using LLMs to produce a simplified structured output containing all necessary instance model information, namely a conceptual instance model, and then compiling this intermediate representation into a valid XMI file. The conceptual instance model is format-independent, allowing it to be transformed into various modeling formats via different compilers. The feasibility of the proposed method has been demonstrated using several LLMs, including GPT-4o, o1-preview, Llama 3.1 (8B and 70B). Results show that the proposed method significantly improves the usability of LLMs for instance model generation tasks. Notably, the smaller open-source model, Llama 3.1 70B, demonstrated performance comparable to proprietary GPT models within the proposed framework.

LLM-enabled Instance Model Generation

TL;DR

This work tackles the challenge of automatically generating valid XMI-based instance models from Ecore metamodels and natural language specifications using large language models. It introduces a two-step workflow: first, a language model produces a conceptual, format-independent Conceptual Instance Model, and second, a dedicated instance compiler converts this intermediate representation into a syntactically valid XMI file via PyEcore. The approach significantly improves the validity and usefulness of generated instance models across both commercial and open-source LLMs, with smaller models like Llama 3.1-70B achieving semantic recall comparable to larger models in this framework. The method enables reliable integration of LLMs into model-based engineering workflows and paves the way for extensions such as retrieval-augmented generation and iterative generation strategies. Overall, the paper demonstrates a practical path to scalable, semantically accurate instance model generation within MOF/EMF ecosystems.

Abstract

In the domain of model-based engineering, models are essential components that enable system design and analysis. Traditionally, the creation of these models has been a manual process requiring not only deep modeling expertise but also substantial domain knowledge of target systems. With the rapid advancement of generative artificial intelligence, large language models (LLMs) show potential for automating model generation. This work explores the generation of instance models using LLMs, focusing specifically on producing XMI-based instance models from Ecore metamodels and natural language specifications. We observe that current LLMs struggle to directly generate valid XMI models. To address this, we propose a two-step approach: first, using LLMs to produce a simplified structured output containing all necessary instance model information, namely a conceptual instance model, and then compiling this intermediate representation into a valid XMI file. The conceptual instance model is format-independent, allowing it to be transformed into various modeling formats via different compilers. The feasibility of the proposed method has been demonstrated using several LLMs, including GPT-4o, o1-preview, Llama 3.1 (8B and 70B). Results show that the proposed method significantly improves the usability of LLMs for instance model generation tasks. Notably, the smaller open-source model, Llama 3.1 70B, demonstrated performance comparable to proprietary GPT models within the proposed framework.

Paper Structure

This paper contains 18 sections, 5 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Generation of an instance model based on a given metamodel and specifications. The prompt and specifications are provided as textual input to the LLM The metamodel and instance model are exchanged in a modeling format.
  • Figure 2: An example of a metamodel illustrating a virtualization-based software-hardware resource allocation system Pan2023. The metamodel is visualized using Eclipse EMF.
  • Figure 3: An instance model example created from the metamodel in Fig. \ref{['fig:metamodel']}. This model was generated using the proposed method with GPT-4o. This instance model is visualized using Eclipse EMF.
  • Figure 4: Overview of the proposed method for instance model generation. Model files are shown in green, natural language-based texts in gray, and computer programs in yellow.
  • Figure 5: Test set information: Distribution of elements in meta-models and instance models, along with the character length of specifications for each instance model.