Table of Contents
Fetching ...

A Hybrid Approach for EMF Code Generation:Code Templates Meet Large Language Models

Xiao He, Ru Chen, Zeqing Zhang, Yanling Wang, Qiuyan Dong

TL;DR

Template-based code generation is rigid and prone to incomplete coverage, while pure LLM approaches risk faulty code in multi-class projects. iEcoreGen hybridizes EMF/template-based generation with LLM-driven completion: derive per-operation NL specifications from an Ecore model, generate a Java skeleton with docstrings, then have LLMs fill and fix the unimplemented parts. The study shows that iEcoreGen improves functional correctness (pass@k) across five open-source LLMs while maintaining comparable compilability, demonstrating that constraining LLMs with templates can yield safer, scalable automation. This work provides a practical pathway to combine model-driven engineering with neural code generation for more reliable automated software development. It also identifies concrete failure modes and design considerations for future hybrid systems and benchmarking in this space.

Abstract

Template-based and LLM-based code generation are both key enablers of automated software development. The former provides correctness guarantees but are rigid for complex requirements, whereas LLMs offer high flexibility at the risk of producing faulty code.This paper proposes iEcoreGen, a hybrid approach that integrates Eclipse Modeling Framework (EMF) and LLMs. In EMF, an Ecore model defines a system structure and acts as a blueprint for code-generation.iEcoreGen decomposes requirements to derive operation specifications, uses EMF's template-based generator to produce initial Java code, and serializes specifications into docstrings. LLMs are then invoked to complete and fix unimplemented methods. We assessed iEcoreGen on twenty code-generation tasks across five LLMs. It surpasses LLM-only baselines on pass@k and performs on par with them on compilation@k. An ablation study clarified the contribution of each component of iEcoreGen. Overall, the findings indicate that LLM-enhanced model-driven development is a promising path toward more efficient software automation.

A Hybrid Approach for EMF Code Generation:Code Templates Meet Large Language Models

TL;DR

Template-based code generation is rigid and prone to incomplete coverage, while pure LLM approaches risk faulty code in multi-class projects. iEcoreGen hybridizes EMF/template-based generation with LLM-driven completion: derive per-operation NL specifications from an Ecore model, generate a Java skeleton with docstrings, then have LLMs fill and fix the unimplemented parts. The study shows that iEcoreGen improves functional correctness (pass@k) across five open-source LLMs while maintaining comparable compilability, demonstrating that constraining LLMs with templates can yield safer, scalable automation. This work provides a practical pathway to combine model-driven engineering with neural code generation for more reliable automated software development. It also identifies concrete failure modes and design considerations for future hybrid systems and benchmarking in this space.

Abstract

Template-based and LLM-based code generation are both key enablers of automated software development. The former provides correctness guarantees but are rigid for complex requirements, whereas LLMs offer high flexibility at the risk of producing faulty code.This paper proposes iEcoreGen, a hybrid approach that integrates Eclipse Modeling Framework (EMF) and LLMs. In EMF, an Ecore model defines a system structure and acts as a blueprint for code-generation.iEcoreGen decomposes requirements to derive operation specifications, uses EMF's template-based generator to produce initial Java code, and serializes specifications into docstrings. LLMs are then invoked to complete and fix unimplemented methods. We assessed iEcoreGen on twenty code-generation tasks across five LLMs. It surpasses LLM-only baselines on pass@k and performs on par with them on compilation@k. An ablation study clarified the contribution of each component of iEcoreGen. Overall, the findings indicate that LLM-enhanced model-driven development is a promising path toward more efficient software automation.

Paper Structure

This paper contains 25 sections, 1 equation, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Code generation in MDE
  • Figure 2: Workflow of iEcoreGen
  • Figure 3: An example of code completion prompt
  • Figure 4: An example of code fix prompt
  • Figure 5: Error case examples