Table of Contents
Fetching ...

Can AI Agents Generate Microservices? How Far are We?

Bassam Adnan, Matteo Esposito, Davide Taibi, Karthik Vaidhyanathan

TL;DR

This work examines whether AI agents can generate functional microservices and how different forms of contextual information influence their performance, and analyzes functional correctness, code quality, and efficiency.

Abstract

LLMs have advanced code generation, but their use for generating microservices with explicit dependencies and API contracts remains understudied. We examine whether AI agents can generate functional microservices and how different forms of contextual information influence their performance. We assess 144 generated microservices across 3 agents, 4 projects, 2 prompting strategies, and 2 scenarios. Incremental generation operates within existing systems and is evaluated with unit tests. Clean state generation starts from requirements alone and is evaluated with integration tests. We analyze functional correctness, code quality, and efficiency. Minimal prompts outperformed detailed ones in incremental generation, with 50-76% unit test pass rates. Clean state generation produced higher integration test pass rates (81-98%), indicating strong API contract adherence. Generated code showed lower complexity than human baselines. Generation times varied widely across agents, averaging 6-16 minutes per service. AI agents can produce microservices with maintainable code, yet inconsistent correctness and reliance on human oversight show that fully autonomous microservice generation is not yet achievable.

Can AI Agents Generate Microservices? How Far are We?

TL;DR

This work examines whether AI agents can generate functional microservices and how different forms of contextual information influence their performance, and analyzes functional correctness, code quality, and efficiency.

Abstract

LLMs have advanced code generation, but their use for generating microservices with explicit dependencies and API contracts remains understudied. We examine whether AI agents can generate functional microservices and how different forms of contextual information influence their performance. We assess 144 generated microservices across 3 agents, 4 projects, 2 prompting strategies, and 2 scenarios. Incremental generation operates within existing systems and is evaluated with unit tests. Clean state generation starts from requirements alone and is evaluated with integration tests. We analyze functional correctness, code quality, and efficiency. Minimal prompts outperformed detailed ones in incremental generation, with 50-76% unit test pass rates. Clean state generation produced higher integration test pass rates (81-98%), indicating strong API contract adherence. Generated code showed lower complexity than human baselines. Generation times varied widely across agents, averaging 6-16 minutes per service. AI agents can produce microservices with maintainable code, yet inconsistent correctness and reliance on human oversight show that fully autonomous microservice generation is not yet achievable.
Paper Structure (29 sections, 8 figures, 4 tables)

This paper contains 29 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Overview of the Study Design
  • Figure 2: Example of structural mismatch in Clean State generation (Train-Ticket payment service): baseline tests expect com.trainticket.* package with Money and AddMoneyRepository classes, but Codex generated payment.* package with MoneyTransaction and MoneyTransactionRepository, causing compilation failures despite functional correctness
  • Figure 3: Code Quality Metrics comparison: Lines of Code (LoC), Cyclomatic Complexity (CycC), and Cognitive Complexity (CogC). Top row shows comparison by agent, bottom row shows comparison by configuration.
  • Figure 4: Time distribution across agents and configurations
  • Figure 5: Cost distribution across agents and configurations
  • ...and 3 more figures