VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models

Merlijn Sevenhuijsen; Khashayar Etemadi; Mattias Nyberg

VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models

Merlijn Sevenhuijsen, Khashayar Etemadi, Mattias Nyberg

TL;DR

This work tackles the trust problem of language-model–generated code in safety-critical contexts by introducing VeCoGen, an automated system that combines LLM-based code generation with formal verification via ACSL and Frama-C. VeCoGen follows a two-step process: generate an initial set of candidate programs and iteratively improve them using feedback from a compiler and verifier, ensuring formal correctness with respect to the specification. In experiments on 15 Codeforces problems, VeCoGen solves 13, with larger gains when NL and formal specifications are used together and when advanced LLMs (e.g., GPT-4o) are employed, demonstrating the value of iterative refinement guided by formal feedback. The results indicate a viable path to automated generation of formally verified C code, with implications for safety-critical software pipelines and rigorous verification workflows.

Abstract

Large language models have demonstrated impressive capabilities in generating code, yet they often produce programs with flaws or deviations from intended behavior, limiting their suitability for safety-critical applications. To address this limitation, this paper introduces VECOGEN, a novel tool that combines large language models with formal verification to automate the generation of formally verified C programs. VECOGEN takes a formal specification in ANSI/ISO C Specification Language, a natural language specification, and a set of test cases to attempt to generate a verified program. This program-generation process consists of two steps. First, VECOGEN generates an initial set of candidate programs. Secondly, the tool iteratively improves on previously generated candidates. If a candidate program meets the formal specification, then we are sure the program is correct. We evaluate VECOGEN on 15 problems presented in Codeforces competitions. On these problems, VECOGEN solves 13 problems. This work shows the potential of combining large language models with formal verification to automate program generation.

VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models

TL;DR

Abstract

VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)