LLM-enabled Instance Model Generation
Fengjunjie Pan, Nenad Petrovic, Vahid Zolfaghari, Long Wen, Alois Knoll
TL;DR
This work tackles the challenge of automatically generating valid XMI-based instance models from Ecore metamodels and natural language specifications using large language models. It introduces a two-step workflow: first, a language model produces a conceptual, format-independent Conceptual Instance Model, and second, a dedicated instance compiler converts this intermediate representation into a syntactically valid XMI file via PyEcore. The approach significantly improves the validity and usefulness of generated instance models across both commercial and open-source LLMs, with smaller models like Llama 3.1-70B achieving semantic recall comparable to larger models in this framework. The method enables reliable integration of LLMs into model-based engineering workflows and paves the way for extensions such as retrieval-augmented generation and iterative generation strategies. Overall, the paper demonstrates a practical path to scalable, semantically accurate instance model generation within MOF/EMF ecosystems.
Abstract
In the domain of model-based engineering, models are essential components that enable system design and analysis. Traditionally, the creation of these models has been a manual process requiring not only deep modeling expertise but also substantial domain knowledge of target systems. With the rapid advancement of generative artificial intelligence, large language models (LLMs) show potential for automating model generation. This work explores the generation of instance models using LLMs, focusing specifically on producing XMI-based instance models from Ecore metamodels and natural language specifications. We observe that current LLMs struggle to directly generate valid XMI models. To address this, we propose a two-step approach: first, using LLMs to produce a simplified structured output containing all necessary instance model information, namely a conceptual instance model, and then compiling this intermediate representation into a valid XMI file. The conceptual instance model is format-independent, allowing it to be transformed into various modeling formats via different compilers. The feasibility of the proposed method has been demonstrated using several LLMs, including GPT-4o, o1-preview, Llama 3.1 (8B and 70B). Results show that the proposed method significantly improves the usability of LLMs for instance model generation tasks. Notably, the smaller open-source model, Llama 3.1 70B, demonstrated performance comparable to proprietary GPT models within the proposed framework.
