Table of Contents
Fetching ...

AutoCodeSherpa: Symbolic Explanations in AI Coding Agents

Sungmin Kang, Haifeng Ruan, Abhik Roychoudhury

TL;DR

AutoCodeSherpa tackles the trust gap in autonomous coding agents by producing executable symbolic explanations for software bugs. It defines a triplet of conditions—input $I$, infection $F_L$, and output $O$—and builds them via a multi-agent pipeline that combines generalization, symbolization, and refinement of a.property-based test with program-state reasoning. The approach enables automatic patch validation, improves other coding agents’ efficacy, and generalizes across multiple LLMs, achieving high condition accuracies (≈$85.7\%$ for inputs, ≈$79.0\%$ for outputs, ≈$79.7\%$ for infection) and substantial practical gains (e.g., 60.7\% rise in plausible patches for Agentless, 123\% more incorrect patches rejected than baselines). The results suggest executable symbolic explanations can enhance trust, patch quality, and cross-agent collaboration in scalable AI-assisted software engineering. Overall, AutoCodeSherpa advances trustworthy automated debugging by making bug explanations testable and executable.

Abstract

Large language model (LLM) agents integrate external tools with one or more LLMs to accomplish specific tasks. Agents have rapidly been adopted by developers, and they are starting to be deployed in industrial workflows, such as their use to fix static analysis issues from the widely used SonarQube static analyzer. However, the growing importance of agents means their actions carry greater impact and potential risk. Thus, to use them at scale, an additional layer of trust and evidence is necessary. This work presents AutoCodeSherpa, a technique that provides explanations of software issues in the form of symbolic formulae. Inspired by the reachability, infection, and propagation model of software faults, the explanations are composed of input, infection, and output conditions, collectively providing a specification of the issue. In practice, the symbolic explanation is implemented as a combination of a property-based test (PBT) and program-internal symbolic expressions. Critically, this means our symbolic explanations are executable and can be automatically evaluated, unlike natural language explanations. Experiments show the generated conditions are highly accurate. For example, input conditions from AutoCodeSherpa had an accuracy of 85.7%. This high accuracy makes symbolic explanations particularly useful in two scenarios. First, the explanations can be used in automated issue resolution environments to decide whether to accept or reject patches from issue resolution agents; AutoCodeSherpa could reject 2x as many incorrect patches as baselines did. Secondly, as agentic AI approaches continue to develop, program analysis driven explanations like ours can be provided to other LLM-based repair techniques which do not employ analysis to improve their output. In our experiments, our symbolic explanations could improve the plausible patch generation rate of the Agentless technique by 60%.

AutoCodeSherpa: Symbolic Explanations in AI Coding Agents

TL;DR

AutoCodeSherpa tackles the trust gap in autonomous coding agents by producing executable symbolic explanations for software bugs. It defines a triplet of conditions—input , infection , and output —and builds them via a multi-agent pipeline that combines generalization, symbolization, and refinement of a.property-based test with program-state reasoning. The approach enables automatic patch validation, improves other coding agents’ efficacy, and generalizes across multiple LLMs, achieving high condition accuracies (≈ for inputs, ≈ for outputs, ≈ for infection) and substantial practical gains (e.g., 60.7\% rise in plausible patches for Agentless, 123\% more incorrect patches rejected than baselines). The results suggest executable symbolic explanations can enhance trust, patch quality, and cross-agent collaboration in scalable AI-assisted software engineering. Overall, AutoCodeSherpa advances trustworthy automated debugging by making bug explanations testable and executable.

Abstract

Large language model (LLM) agents integrate external tools with one or more LLMs to accomplish specific tasks. Agents have rapidly been adopted by developers, and they are starting to be deployed in industrial workflows, such as their use to fix static analysis issues from the widely used SonarQube static analyzer. However, the growing importance of agents means their actions carry greater impact and potential risk. Thus, to use them at scale, an additional layer of trust and evidence is necessary. This work presents AutoCodeSherpa, a technique that provides explanations of software issues in the form of symbolic formulae. Inspired by the reachability, infection, and propagation model of software faults, the explanations are composed of input, infection, and output conditions, collectively providing a specification of the issue. In practice, the symbolic explanation is implemented as a combination of a property-based test (PBT) and program-internal symbolic expressions. Critically, this means our symbolic explanations are executable and can be automatically evaluated, unlike natural language explanations. Experiments show the generated conditions are highly accurate. For example, input conditions from AutoCodeSherpa had an accuracy of 85.7%. This high accuracy makes symbolic explanations particularly useful in two scenarios. First, the explanations can be used in automated issue resolution environments to decide whether to accept or reject patches from issue resolution agents; AutoCodeSherpa could reject 2x as many incorrect patches as baselines did. Secondly, as agentic AI approaches continue to develop, program analysis driven explanations like ours can be provided to other LLM-based repair techniques which do not employ analysis to improve their output. In our experiments, our symbolic explanations could improve the plausible patch generation rate of the Agentless technique by 60%.

Paper Structure

This paper contains 37 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: An example PBT
  • Figure 2: Overview of AutoCodeSherpa with a real example simplified for clarity; PBT is for property-based test.
  • Figure 3: The buggy function involved in our running example.
  • Figure 4: Output from the generalization phase of PBT generation.
  • Figure 5: Detailed condition generation flowcharts for AutoCodeSherpa.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 1: Symbolic explanation