Understanding Formal Reasoning Failures in LLMs as Abstract Interpreters

Jacqueline L. Mitchell; Brian Hyeongseok Kim; Chenyu Zhou; Chao Wang

Understanding Formal Reasoning Failures in LLMs as Abstract Interpreters

Jacqueline L. Mitchell, Brian Hyeongseok Kim, Chenyu Zhou, Chao Wang

TL;DR

The paper probes whether large language models can reason about program semantics through the lens of abstract interpretation, focusing on invariant generation. It introduces two prompting strategies, Compositional and Transitional, to elicit step-by-step abstract-interpretation traces and evaluates them across four state-of-the-art LLMs on 22 SV-COMP benchmarks, verifying invariants with UAutomizer. The study reveals that LLMs can generate sound invariants in many cases but exhibit recurring reasoning errors in control-flow understanding, fixpoint computation, and operation semantics, with pronounced model- and program-dependent differences between strategies. The work highlights concrete thematic errors, analyzes their sources, and suggests opportunities for better prompting, modular context management, and possibly architectural adaptations to make LLMs more reliable for verification tasks. Overall, the findings illuminate both the potential and the current limits of using LLMs for formal verification tasks and guide future research toward more robust, reasoning-capable systems for invariant generation.

Abstract

Large language models (LLMs) are increasingly used for program verification, and yet little is known about \emph{how} they reason about program semantics during this process. In this work, we focus on abstract interpretation based-reasoning for invariant generation and introduce two novel prompting strategies that aim to elicit such reasoning from LLMs. We evaluate these strategies across several state-of-the-art LLMs on 22 programs from the SV-COMP benchmark suite widely used in software verification. We analyze both the soundness of the generated invariants and the key thematic patterns in the models' reasoning errors. This work aims to highlight new research opportunities at the intersection of LLMs and program verification for applying LLMs to verification tasks and advancing their reasoning capabilities in this application.

Understanding Formal Reasoning Failures in LLMs as Abstract Interpreters

TL;DR

Abstract

Understanding Formal Reasoning Failures in LLMs as Abstract Interpreters

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)

Theorems & Definitions (2)