Table of Contents
Fetching ...

ModiGen: A Large Language Model-Based Workflow for Multi-Task Modelica Code Generation

Jiahui Xiang, Tong Ye, Peiyu Liu, Yinan Zhang, Wenhai Wang

TL;DR

This work tackles the challenge of generating simulatable Modelica code with large language models by introducing ModiGen, a three-stage workflow that combines data preprocessing and fine-tuning, GraphRAG-based knowledge augmentation, and feedback-driven validation. The authors build benchmark datasets for Modelica component and test-case generation from multiple libraries, enabling systematic evaluation. They demonstrate substantial performance gains, with maximum pass@1 improvements of 0.3349 for components and 0.2457 for test cases, and show that larger, well-tuned models can surpass proprietary baselines on certain tasks. The study provides evidence that integrating fine-tuning, structured knowledge retrieval, and iterative feedback can significantly improve the reliability and correctness of LLM-generated Modelica code, advancing automated modeling tools for engineering applications.

Abstract

Modelica is a widely adopted language for simulating complex physical systems, yet effective model creation and optimization require substantial domain expertise. Although large language models (LLMs) have demonstrated promising capabilities in code generation, their application to modeling remains largely unexplored. To address this gap, we have developed benchmark datasets specifically designed to evaluate the performance of LLMs in generating Modelica component models and test cases. Our evaluation reveals substantial limitations in current LLMs, as the generated code often fails to simulate successfully. To overcome these challenges, we propose a specialized workflow that integrates supervised fine-tuning, graph retrieval-augmented generation, and feedback optimization to improve the accuracy and reliability of Modelica code generation. The evaluation results demonstrate significant performance gains: the maximum improvement in pass@1 reached 0.3349 for the component generation task and 0.2457 for the test case generation task. This research underscores the potential of LLMs to advance intelligent modeling tools and offers valuable insights for future developments in system modeling and engineering applications.

ModiGen: A Large Language Model-Based Workflow for Multi-Task Modelica Code Generation

TL;DR

This work tackles the challenge of generating simulatable Modelica code with large language models by introducing ModiGen, a three-stage workflow that combines data preprocessing and fine-tuning, GraphRAG-based knowledge augmentation, and feedback-driven validation. The authors build benchmark datasets for Modelica component and test-case generation from multiple libraries, enabling systematic evaluation. They demonstrate substantial performance gains, with maximum pass@1 improvements of 0.3349 for components and 0.2457 for test cases, and show that larger, well-tuned models can surpass proprietary baselines on certain tasks. The study provides evidence that integrating fine-tuning, structured knowledge retrieval, and iterative feedback can significantly improve the reliability and correctness of LLM-generated Modelica code, advancing automated modeling tools for engineering applications.

Abstract

Modelica is a widely adopted language for simulating complex physical systems, yet effective model creation and optimization require substantial domain expertise. Although large language models (LLMs) have demonstrated promising capabilities in code generation, their application to modeling remains largely unexplored. To address this gap, we have developed benchmark datasets specifically designed to evaluate the performance of LLMs in generating Modelica component models and test cases. Our evaluation reveals substantial limitations in current LLMs, as the generated code often fails to simulate successfully. To overcome these challenges, we propose a specialized workflow that integrates supervised fine-tuning, graph retrieval-augmented generation, and feedback optimization to improve the accuracy and reliability of Modelica code generation. The evaluation results demonstrate significant performance gains: the maximum improvement in pass@1 reached 0.3349 for the component generation task and 0.2457 for the test case generation task. This research underscores the potential of LLMs to advance intelligent modeling tools and offers valuable insights for future developments in system modeling and engineering applications.

Paper Structure

This paper contains 22 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The Simulation Curve of Height and Velocity Variation of Bouncing Ball.
  • Figure 2: Overview of the Modelica Code Generation Workflow Architecture.
  • Figure 3: The Distribution of Structural Types within the Datasets.
  • Figure 4: Property Graph Representation of Test_RealGreat Model.
  • Figure 5: The Flowchart of Modelica Code Generation and Feedback Optimization Strategy.
  • ...and 2 more figures