Table of Contents
Fetching ...

Towards Large Language Model Aided Program Refinement

Yufan Cai, Zhe Hou, Xiaokun Luan, David Miguel Sanan Baena, Yun Lin, Jun Sun, Jin Song Dong

TL;DR

LLM4PR is a tool that combines formal program refinement techniques with informal LLM-based methods to transform the specification to preconditions and postconditions, and verify that the generated code satisfies the conditions of refinement calculus, thus guaranteeing the correctness of the code.

Abstract

Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs. Traditional verification tool support for program refinement is highly interactive and lacks automation. On the other hand, the emergence of large language models (LLMs) enables automatic code generations from informal natural language specifications. However, code generated by LLMs is often unreliable. Moreover, the opaque procedure from specification to code provided by LLM is an uncontrolled black box. We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods to (1) transform the specification to preconditions and postconditions, (2) automatically build prompts based on refinement calculus, (3) interact with LLM to generate code, and finally, (4) verify that the generated code satisfies the conditions of refinement calculus, thus guaranteeing the correctness of the code. We have implemented our tool using GPT4, Coq, and Coqhammer, and evaluated it on the HumanEval and EvalPlus datasets.

Towards Large Language Model Aided Program Refinement

TL;DR

LLM4PR is a tool that combines formal program refinement techniques with informal LLM-based methods to transform the specification to preconditions and postconditions, and verify that the generated code satisfies the conditions of refinement calculus, thus guaranteeing the correctness of the code.

Abstract

Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs. Traditional verification tool support for program refinement is highly interactive and lacks automation. On the other hand, the emergence of large language models (LLMs) enables automatic code generations from informal natural language specifications. However, code generated by LLMs is often unreliable. Moreover, the opaque procedure from specification to code provided by LLM is an uncontrolled black box. We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods to (1) transform the specification to preconditions and postconditions, (2) automatically build prompts based on refinement calculus, (3) interact with LLM to generate code, and finally, (4) verify that the generated code satisfies the conditions of refinement calculus, thus guaranteeing the correctness of the code. We have implemented our tool using GPT4, Coq, and Coqhammer, and evaluated it on the HumanEval and EvalPlus datasets.

Paper Structure

This paper contains 50 sections, 8 theorems, 4 equations, 5 figures, 5 tables.

Key Result

lemma thmcounterlemma

Let $x_0$ denote the initial value of variant $x$, if$\ (x=x_0) \land P \Rrightarrow Q$, then the specification$\ x: [P, Q] \sqsubseteq$ Skip.

Figures (5)

  • Figure 1: Problematic code generated by GPT4 and Copilot for computing square root. The upper two programs are wrong in the case $N < 1$ due to the wrong upper bound initialization. The repaired code still fails in several cases when the variable x goes to the fixed point but does not terminate the loop due to the floating-point precision error. The last code contains the refinement constraints that point out the above problems.
  • Figure 2: Refinement laws semantics of the specification language and program language.
  • Figure 3: Overview of LLM4PR that combines LLMs and program refinement.
  • Figure 4: The framework of the specification tree and the program refinement library.
  • Figure 5: Example of a square root program in program refinement. The statements with # are the explanation and the statements with // are the specifications.

Theorems & Definitions (26)

  • definition thmcounterdefinition: Strengthen Postcondition Law
  • definition thmcounterdefinition: Weaken Precondition Law
  • definition thmcounterdefinition: Skip Law
  • definition thmcounterdefinition: Sequential Composition Law
  • definition thmcounterdefinition: Assignment Law
  • definition thmcounterdefinition: Alternation Law
  • definition thmcounterdefinition: Iteration Law
  • definition thmcounterdefinition: Expand Law
  • definition thmcounterdefinition: Procedure
  • definition thmcounterdefinition: Procedure Value Specification
  • ...and 16 more