Table of Contents
Fetching ...

SpecMind: Cognitively Inspired, Interactive Multi-Turn Framework for Postcondition Inference

Cuong Chi Le, Minh V. T Pham, Tung Vu Duy, Cuong Duc Van, Huy N. Phan, Hoang N. Phan, Tien N. Nguyen

Abstract

Specifications are vital for ensuring program correctness, yet writing them manually remains challenging and time-intensive. Recent large language model (LLM)-based methods have shown successes in generating specifications such as postconditions, but existing single-pass prompting often yields inaccurate results. In this paper, we present SpecMind, a novel framework for postcondition generation that treats LLMs as interactive and exploratory reasoners rather than one-shot generators. SpecMind employs feedback-driven multi-turn prompting approaches, enabling the model to iteratively refine candidate postconditions by incorporating implicit and explicit correctness feedback, while autonomously deciding when to stop. This process fosters deeper code comprehension and improves alignment with true program behavior via exploratory attempts. Our empirical evaluation shows that SpecMind significantly outperforms state-of-the-art approaches in both accuracy and completeness of generated postconditions.

SpecMind: Cognitively Inspired, Interactive Multi-Turn Framework for Postcondition Inference

Abstract

Specifications are vital for ensuring program correctness, yet writing them manually remains challenging and time-intensive. Recent large language model (LLM)-based methods have shown successes in generating specifications such as postconditions, but existing single-pass prompting often yields inaccurate results. In this paper, we present SpecMind, a novel framework for postcondition generation that treats LLMs as interactive and exploratory reasoners rather than one-shot generators. SpecMind employs feedback-driven multi-turn prompting approaches, enabling the model to iteratively refine candidate postconditions by incorporating implicit and explicit correctness feedback, while autonomously deciding when to stop. This process fosters deeper code comprehension and improves alignment with true program behavior via exploratory attempts. Our empirical evaluation shows that SpecMind significantly outperforms state-of-the-art approaches in both accuracy and completeness of generated postconditions.
Paper Structure (23 sections, 2 equations, 8 figures, 4 tables)

This paper contains 23 sections, 2 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Example of task #69 from EvalPlus with postconditions from Single-pass (nl2postcond nl2postcond), Greedy Multi-turn, and Exploratory Multi-turn. Blue blocks show model reasoning (omitted for all but turn 5 of Exploratory Multi-turn for space reason), red, orange, and gray blocks show submission and exploration attempts, and feedback. ✓: correct and complete postcondition, $\times$: otherwise.
  • Figure 2: Feedback-Driven Exploratory Multi-Turn Algorithm with Completeness Threshold
  • Figure 3: Template prompt for Greedy and Exploratory.
  • Figure 4: Efficiency for configurations with $\mu$=12.
  • Figure 5: Frequency distribution analysis from seven reasoning categories across attempts (RQ2)
  • ...and 3 more figures