Table of Contents
Fetching ...

AutoICE: Automatically Synthesizing Verifiable C Code via LLM-driven Evolution

Weilin Luo, Xueyi Liang, Haotian Deng, Yanan Liu, Hai Wan

TL;DR

AutoICE tackles the challenge of autoformalization by synthesizing verifiable C code annotated with ACSL using an LLM-driven evolutionary search. It introduces diverse initialization, collaborative crossover, and self-reflective mutation to explore implicit knowledge and mitigate error propagation from single-pass LLM generation. Evaluation across FM-ALPACA/FM-BENCH and developer-friendly variants shows AutoICE achieving up to 90.36% verification and superior performance over SOTA, with ablations confirming the contribution of each component. The work demonstrates practical potential for democratizing formal verification in safety-critical software by reducing reliance on detailed natural-language requirements and expert formal-methods expertise.

Abstract

Automatically synthesizing verifiable code from natural language requirements ensures software correctness and reliability while significantly lowering the barrier to adopting the techniques of formal methods. With the rise of large language models (LLMs), long-standing efforts at autoformalization have gained new momentum. However, existing approaches suffer from severe syntactic and semantic errors due to the scarcity of domain-specific pre-training corpora and often fail to formalize implicit knowledge effectively. In this paper, we propose AutoICE, an LLM-driven evolutionary search for synthesizing verifiable C code. It introduces the diverse individual initialization and the collaborative crossover to enable diverse iterative updates, thereby mitigating error propagation inherent in single-agent iterations. Besides, it employs the self-reflective mutation to facilitate the discovery of implicit knowledge. Evaluation results demonstrate the effectiveness of AutoICE: it successfully verifies $90.36$\% of code, outperforming the state-of-the-art (SOTA) approach. Besides, on a developer-friendly dataset variant, AutoICE achieves a $88.33$\% verification success rate, significantly surpassing the $65$\% success rate of the SOTA approach.

AutoICE: Automatically Synthesizing Verifiable C Code via LLM-driven Evolution

TL;DR

AutoICE tackles the challenge of autoformalization by synthesizing verifiable C code annotated with ACSL using an LLM-driven evolutionary search. It introduces diverse initialization, collaborative crossover, and self-reflective mutation to explore implicit knowledge and mitigate error propagation from single-pass LLM generation. Evaluation across FM-ALPACA/FM-BENCH and developer-friendly variants shows AutoICE achieving up to 90.36% verification and superior performance over SOTA, with ablations confirming the contribution of each component. The work demonstrates practical potential for democratizing formal verification in safety-critical software by reducing reliance on detailed natural-language requirements and expert formal-methods expertise.

Abstract

Automatically synthesizing verifiable code from natural language requirements ensures software correctness and reliability while significantly lowering the barrier to adopting the techniques of formal methods. With the rise of large language models (LLMs), long-standing efforts at autoformalization have gained new momentum. However, existing approaches suffer from severe syntactic and semantic errors due to the scarcity of domain-specific pre-training corpora and often fail to formalize implicit knowledge effectively. In this paper, we propose AutoICE, an LLM-driven evolutionary search for synthesizing verifiable C code. It introduces the diverse individual initialization and the collaborative crossover to enable diverse iterative updates, thereby mitigating error propagation inherent in single-agent iterations. Besides, it employs the self-reflective mutation to facilitate the discovery of implicit knowledge. Evaluation results demonstrate the effectiveness of AutoICE: it successfully verifies \% of code, outperforming the state-of-the-art (SOTA) approach. Besides, on a developer-friendly dataset variant, AutoICE achieves a \% verification success rate, significantly surpassing the \% success rate of the SOTA approach.

Paper Structure

This paper contains 35 sections, 2 equations, 8 figures, 5 tables, 2 algorithms.

Figures (8)

  • Figure 1: Traditional requirements vs. developer-friendly requirements.
  • Figure 2: An example of the prompts for the two-phase, where colors indicate role definition (blue), task description (green), task constraint (orange), and inputs (gray). We omit the inputs and mark them as '//'.
  • Figure 3: An example of the crossover prompt and a schematic diagram of the collaborative crossover, where colors indicate domain description (purple).
  • Figure 4: An example of the mutation prompt and a schematic diagram of the self-reflective mutation.
  • Figure 5: An example of the refinement prompt.
  • ...and 3 more figures