Migrating Existing Container Workload to Kubernetes -- LLM Based Approach and Evaluation
Masaru Ueno, Tetsuya Uchiumi
TL;DR
This paper tackles the challenge of migrating Compose-based workloads to Kubernetes by evaluating LLM-driven manifest synthesis against a purpose-built microbenchmark. It introduces three quality criteria—correctness, context-groundedness, and consistency—and assesses prompts, model varieties, and JSON-mode outputs across 50-sample setups. Key findings show that while LLMs can produce accurate manifests for standard cases, they may omit readability comments and struggle with atypical inputs; structured JSON outputs and expert prompting improve stability, yet non-determinism necessitates human QA and validation. The work provides a rigorous, open benchmark and practical guidance for automating Kubernetes manifest generation from Compose inputs, with implications for DevOps tooling and future research on prompt tuning and postprocessing pipelines.
Abstract
Although Kubernetes has become a widespread open-source system that automates the management of containerized applications, its complexity can be a significant barrier, particularly for application developers unfamiliar with it. One approach employs large language models (LLMs) to assist developers in generating Kubernetes manifests; however it is currently impossible to determine whether the output satisfies given specifications and is comprehensible. In this study, we proposed a benchmarking method for evaluating the effectiveness of LLMs in synthesizing manifests, using the Compose specification -- a standard widely adopted by application developers -- as input. The proposed benchmarking method revealed that LLMs generally produce accurate results that compensate for simple specification gaps. However, we also observed that inline comments for readability were often omitted, and completion accuracy was low for atypical inputs with unclear intentions.
