Table of Contents
Fetching ...

DebugHarness: Emulating Human Dynamic Debugging for Autonomous Program Repair

Maolin Sun, Yibiao Yang, Xuanlin Liu, Yuming Zhou, Baowen Xu

Abstract

Patching severe security flaws in complex software remains a major challenge. While automated tools like fuzzers efficiently discover bugs, fixing deep-rooted low-level faults (e.g., use-after-free and memory corruption) still requires labor-intensive manual analysis by experts. Emerging Large Language Model (LLM) agents attempt to automate this pipeline, but they typically treat bug fixing as a purely static code-generation task. Relying solely on static artifacts, these methods miss the dynamic execution context strictly necessary for diagnosing intricate memory safety violations. To overcome these limitations, we introduce DebugHarness, an autonomous LLM-powered debugging agent harness that resolves complex vulnerabilities by emulating the interactive debugging practices of human systems engineers. Instead of merely examining static code, DebugHarness actively queries the live runtime environment. Driven by a reproducible crash, it utilizes a pattern-guided investigation strategy to formulate hypotheses, interactively probes program memory states and execution paths, and synthesizes patches via a closed-loop validation cycle. We evaluate DebugHarness on SEC-bench, a rigorous dataset of real-world C/C++ security vulnerabilities. DebugHarness successfully patches approximately 90% of the evaluated bugs. This yields a relative improvement of over 30% compared to state-of-the-art baselines, demonstrating that dynamic debugging significantly enhances LLM diagnostic capabilities. Overall, DebugHarness establishes a novel paradigm for automated program repair, bridging the gap between static LLM reasoning and the dynamic intricacies of low-level systems programming.

DebugHarness: Emulating Human Dynamic Debugging for Autonomous Program Repair

Abstract

Patching severe security flaws in complex software remains a major challenge. While automated tools like fuzzers efficiently discover bugs, fixing deep-rooted low-level faults (e.g., use-after-free and memory corruption) still requires labor-intensive manual analysis by experts. Emerging Large Language Model (LLM) agents attempt to automate this pipeline, but they typically treat bug fixing as a purely static code-generation task. Relying solely on static artifacts, these methods miss the dynamic execution context strictly necessary for diagnosing intricate memory safety violations. To overcome these limitations, we introduce DebugHarness, an autonomous LLM-powered debugging agent harness that resolves complex vulnerabilities by emulating the interactive debugging practices of human systems engineers. Instead of merely examining static code, DebugHarness actively queries the live runtime environment. Driven by a reproducible crash, it utilizes a pattern-guided investigation strategy to formulate hypotheses, interactively probes program memory states and execution paths, and synthesizes patches via a closed-loop validation cycle. We evaluate DebugHarness on SEC-bench, a rigorous dataset of real-world C/C++ security vulnerabilities. DebugHarness successfully patches approximately 90% of the evaluated bugs. This yields a relative improvement of over 30% compared to state-of-the-art baselines, demonstrating that dynamic debugging significantly enhances LLM diagnostic capabilities. Overall, DebugHarness establishes a novel paradigm for automated program repair, bridging the gap between static LLM reasoning and the dynamic intricacies of low-level systems programming.

Paper Structure

This paper contains 23 sections, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: CVE-2022-1286 code snippets showing (a) a comparison of resolution workflows. Static agents are limited to analyzing the ASan report and source code, leading them to stall at the symptom site and propose superficial patches. In contrast, DebugHarness mimics true human debugging by utilizing dynamic memory introspection and watchpoints to trace the stale pointer back to its root cause in a different file. (b) The symptomatic indirect call in vm.c. (c) The actual root cause and patch involving missing cache invalidation in class.c.
  • Figure 2: Overview of DebugHarness's workflow.
  • Figure 3: The prompt template for signature-driven initialization. DebugHarness injects error-class-specific troubleshooting guidelines based on the crash signature.
  • Figure 4: Cumulative distribution of iteration counts for repairs across different LLM backbones.
  • Figure 5: Venn diagram showing the overlap of successfully resolved vulnerabilities across the three LLM backbones.
  • ...and 1 more figures