Can AI Agents Generate Microservices? How Far are We?

Bassam Adnan; Matteo Esposito; Davide Taibi; Karthik Vaidhyanathan

Can AI Agents Generate Microservices? How Far are We?

Bassam Adnan, Matteo Esposito, Davide Taibi, Karthik Vaidhyanathan

TL;DR

This work examines whether AI agents can generate functional microservices and how different forms of contextual information influence their performance, and analyzes functional correctness, code quality, and efficiency.

Abstract

LLMs have advanced code generation, but their use for generating microservices with explicit dependencies and API contracts remains understudied. We examine whether AI agents can generate functional microservices and how different forms of contextual information influence their performance. We assess 144 generated microservices across 3 agents, 4 projects, 2 prompting strategies, and 2 scenarios. Incremental generation operates within existing systems and is evaluated with unit tests. Clean state generation starts from requirements alone and is evaluated with integration tests. We analyze functional correctness, code quality, and efficiency. Minimal prompts outperformed detailed ones in incremental generation, with 50-76% unit test pass rates. Clean state generation produced higher integration test pass rates (81-98%), indicating strong API contract adherence. Generated code showed lower complexity than human baselines. Generation times varied widely across agents, averaging 6-16 minutes per service. AI agents can produce microservices with maintainable code, yet inconsistent correctness and reliance on human oversight show that fully autonomous microservice generation is not yet achievable.

Can AI Agents Generate Microservices? How Far are We?

TL;DR

Abstract

Paper Structure (29 sections, 8 figures, 4 tables)

This paper contains 29 sections, 8 figures, 4 tables.

Introduction
Background and Related Work
Agentic AI for Code Generation
Related Works
Study Design
Goal
Research Questions
Experiment Workflow
Project Selection
LLM Agent Selection
Pilot Study and Prompt Development
Generation Scenarios
Microservice Generation
Evaluation Metrics
Results
...and 14 more sections

Figures (8)

Figure 1: Overview of the Study Design
Figure 2: Example of structural mismatch in Clean State generation (Train-Ticket payment service): baseline tests expect com.trainticket.* package with Money and AddMoneyRepository classes, but Codex generated payment.* package with MoneyTransaction and MoneyTransactionRepository, causing compilation failures despite functional correctness
Figure 3: Code Quality Metrics comparison: Lines of Code (LoC), Cyclomatic Complexity (CycC), and Cognitive Complexity (CogC). Top row shows comparison by agent, bottom row shows comparison by configuration.
Figure 4: Time distribution across agents and configurations
Figure 5: Cost distribution across agents and configurations
...and 3 more figures

Can AI Agents Generate Microservices? How Far are We?

TL;DR

Abstract

Can AI Agents Generate Microservices? How Far are We?

Authors

TL;DR

Abstract

Table of Contents

Figures (8)