LLM Agents for Generating Microservice-based Applications: how complex is your specification?
Daniel M. Yellin
TL;DR
The study tackles generating microservice-based applications from full specifications using LLM Agents, introducing a standardized MSBA spec template and a complexity metric to quantify difficulty. By evaluating 8 microservices across two applications with multiple LLMs, the authors show that performance declines as specification complexity rises, with strong models handling medium difficulty better than high-difficulty cases. They demonstrate that a fine-grained, per-request code generation approach yields substantially higher correctness than coarse-grained generation, offering a practical path forward for LLM-assisted MSBA development. Overall, the work provides both a rigorous framework for spec-based MSBA generation and actionable insights into how to push LLMs toward reliable, real-world software production.
Abstract
In this paper we evaluate the capabilities of LLM Agents in generating code for real-world problems. Specifically, we explore code synthesis for microservice-based applications, a widely used architectural pattern for building applications. We define a standard template for specifying these applications, and we propose a metric for scoring the difficulty of a specification. The higher the score, the more difficult it is to generate code for the specification. Our experimental results show that agents using strong LLMs (like GPT-3o-mini) do fairly well on medium difficulty specifications but do poorly on those of higher difficulty levels. This is due to more intricate business logic, a greater use of external services, database integration and inclusion of non-functional capabilities such as authentication. We analyzed the errors in LLM-synthesized code and report on the key challenges LLM Agents face in generating code for these specifications. Finally, we show that using a fine-grained approach to code generation improves the correctness of the generated code.
