Table of Contents
Fetching ...

Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents

Shuvendu K. Lahiri

Abstract

Agentic AI systems can now generate code with remarkable fluency, but a fundamental question remains: \emph{does the generated code actually do what the user intended?} The gap between informal natural language requirements and precise program behavior -- the \emph{intent gap} -- has always plagued software engineering, but AI-generated code amplifies it to an unprecedented scale. This article argues that \textbf{intent formalization} -- the translation of informal user intent into a set of checkable formal specifications -- is the key challenge that will determine whether AI makes software more reliable or merely more abundant. Intent formalization offers a tradeoff spectrum suitable to the reliability needs of different contexts: from lightweight tests that disambiguate likely misinterpretations, through full functional specifications for formal verification, to domain-specific languages from which correct code is synthesized automatically. The central bottleneck is \emph{validating specifications}: since there is no oracle for specification correctness other than the user, we need semi-automated metrics that can assess specification quality with or without code, through lightweight user interaction and proxy artifacts such as tests. We survey early research that demonstrates the \emph{potential} of this approach: interactive test-driven formalization that improves program correctness, AI-generated postconditions that catch real-world bugs missed by prior methods, and end-to-end verified pipelines that produce provably correct code from informal specifications. We outline the open research challenges -- scaling beyond benchmarks, achieving compositionality over changes, metrics for validating specifications, handling rich logics, designing human-AI specification interactions -- that define a research agenda spanning AI, programming languages, formal methods, and human-computer interaction.

Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents

Abstract

Agentic AI systems can now generate code with remarkable fluency, but a fundamental question remains: \emph{does the generated code actually do what the user intended?} The gap between informal natural language requirements and precise program behavior -- the \emph{intent gap} -- has always plagued software engineering, but AI-generated code amplifies it to an unprecedented scale. This article argues that \textbf{intent formalization} -- the translation of informal user intent into a set of checkable formal specifications -- is the key challenge that will determine whether AI makes software more reliable or merely more abundant. Intent formalization offers a tradeoff spectrum suitable to the reliability needs of different contexts: from lightweight tests that disambiguate likely misinterpretations, through full functional specifications for formal verification, to domain-specific languages from which correct code is synthesized automatically. The central bottleneck is \emph{validating specifications}: since there is no oracle for specification correctness other than the user, we need semi-automated metrics that can assess specification quality with or without code, through lightweight user interaction and proxy artifacts such as tests. We survey early research that demonstrates the \emph{potential} of this approach: interactive test-driven formalization that improves program correctness, AI-generated postconditions that catch real-world bugs missed by prior methods, and end-to-end verified pipelines that produce provably correct code from informal specifications. We outline the open research challenges -- scaling beyond benchmarks, achieving compositionality over changes, metrics for validating specifications, handling rich logics, designing human-AI specification interactions -- that define a research agenda spanning AI, programming languages, formal methods, and human-computer interaction.
Paper Structure (20 sections, 3 figures)

This paper contains 20 sections, 3 figures.

Figures (3)

  • Figure 1: The intent gap in software development. Top: Traditional AI code generation translates informal, ambiguous natural language directly into operational code, leaving a large semantic gap. Bottom: Formal specifications serve as an intermediate "what" layer, reducing the gap and enabling enforcement through testing and verification.
  • Figure 2: The spectrum of intent formalization. Specifications (top row) range from partial (tests) to complete (domain-specific languages)---all are formal and checkable, differing in correctness coverage. Tests and code contracts are checked dynamically; logical contracts require static verification via a program verifier; DSLs enable verified synthesis of correct-by-construction code. LLMs can help generate artifacts at every level.
  • Figure 3: The TiCoder interactive workflow for test-driven user-intent formalization. The developer provides a natural language prompt; the LLM generates candidate code and tests. The user iteratively approves or rejects tests, which prune and rank candidates. This improves correct evaluation of AI code from 40% to 84% fakhoury2024tse.