Table of Contents
Fetching ...

Using a Feedback Loop for LLM-based Infrastructure as Code Generation

Mayur Amarnath Palavalli, Mark Santolucito

TL;DR

The paper investigates whether an LLM agent can generate AWS CloudFormation IaC and improve through a feedback loop that uses cfn-lint to provide error and warning messages. It builds a benchmark of 165 templates from 33 prompts to evaluate iterative revisions and reports that the feedback loop's effectiveness decays exponentially with each iteration, plateauing around iteration 5–6. The findings indicate that while LLM-assisted IaC generation benefits from automated feedback, achieving scalable, semantically correct infrastructure requires additional validation strategies and improved feedback mechanisms. This work highlights the gap between syntactic validity and semantic correctness in IaC generation and points to future research directions to reach robust, production-grade tooling.

Abstract

Code generation with Large Language Models (LLMs) has helped to increase software developer productivity in coding tasks, but has yet to have significant impact on the tasks of software developers that surround this code. In particular, the challenge of infrastructure management remains an open question. We investigate the ability of an LLM agent to construct infrastructure using the Infrastructure as Code (IaC) paradigm. We particularly investigate the use of a feedback loop that returns errors and warnings on the generated IaC to allow the LLM agent to improve the code. We find that, for each iteration of the loop, its effectiveness decreases exponentially until it plateaus at a certain point and becomes ineffective.

Using a Feedback Loop for LLM-based Infrastructure as Code Generation

TL;DR

The paper investigates whether an LLM agent can generate AWS CloudFormation IaC and improve through a feedback loop that uses cfn-lint to provide error and warning messages. It builds a benchmark of 165 templates from 33 prompts to evaluate iterative revisions and reports that the feedback loop's effectiveness decays exponentially with each iteration, plateauing around iteration 5–6. The findings indicate that while LLM-assisted IaC generation benefits from automated feedback, achieving scalable, semantically correct infrastructure requires additional validation strategies and improved feedback mechanisms. This work highlights the gap between syntactic validity and semantic correctness in IaC generation and points to future research directions to reach robust, production-grade tooling.

Abstract

Code generation with Large Language Models (LLMs) has helped to increase software developer productivity in coding tasks, but has yet to have significant impact on the tasks of software developers that surround this code. In particular, the challenge of infrastructure management remains an open question. We investigate the ability of an LLM agent to construct infrastructure using the Infrastructure as Code (IaC) paradigm. We particularly investigate the use of a feedback loop that returns errors and warnings on the generated IaC to allow the LLM agent to improve the code. We find that, for each iteration of the loop, its effectiveness decreases exponentially until it plateaus at a certain point and becomes ineffective.

Paper Structure

This paper contains 11 sections, 5 figures.

Figures (5)

  • Figure 1: An example AWS CloudFormation JSON template.
  • Figure 2: An example error message from cfn-lint.
  • Figure 3: An example prompt from the official AWS CloudFormation Template Schema repository.
  • Figure 4: A diagram of the feedback loop: we provide the LLM with a prompt for an AWS CloudFormation file, which is run through cfn-lint to produce error/warning message(s) which are given back to the LLM.
  • Figure 5: A histogram of errors over multiple cfn-lint feedback iterations showing error bars representing the distribution over six trials.