Table of Contents
Fetching ...

Adversarial Bug Reports as a Security Risk in Language Model-Based Automated Program Repair

Piotr Przymus, Andreas Happe, Jürgen Cito

TL;DR

This paper exposes a novel security risk in language model–based automated program repair by showing that adversarial bug reports can reliably induce insecure or wasteful patches. Through a formal threat model, an attack framework, and an empirical study using 51 crafted bug reports against a leading APR system, the authors demonstrate that current pre- and post-APR defenses only partially mitigate these attacks, with attacker-aligned patches produced in the majority of cases. They reveal a pronounced cost asymmetry favoring attackers: generating adversarial inputs is inexpensive, while defending and validating patches incurs substantially higher costs. The work provides a prototype framework for automated adversarial bug-report generation, discusses practical defense configurations, and offers recommendations for building security-resilient APR systems, including structured prompts, runtime isolation, HITL review, and patch provenance tracking.

Abstract

Large Language Model (LLM) - based Automated Program Repair (APR) systems are increasingly integrated into modern software development workflows, offering automated patches in response to natural language bug reports. However, this reliance on untrusted user input introduces a novel and underexplored attack surface. In this paper, we investigate the security risks posed by adversarial bug reports -- realistic-looking issue submissions crafted to mislead APR systems into producing insecure or harmful code changes. We develop a comprehensive threat model and conduct an empirical study to evaluate the vulnerability of APR systems to such attacks. Our demonstration comprises 51 adversarial bug reports generated across a spectrum of strategies, ranging from manual curation to fully automated pipelines. We test these against a leading LLM-based APR system and assess both pre-repair defenses (e.g., LlamaGuard variants, PromptGuard variants, Granite-Guardian, and custom LLM filters) and post-repair detectors (GitHub Copilot, CodeQL). Our findings show that current defenses are insufficient: 90% of crafted bug reports triggered attacker-aligned patches. The best pre-repair filter blocked only 47%, while post-repair analysis -- often requiring human oversight -- was effective in just 58% of cases. To support scalable security testing, we introduce a prototype framework for automating the generation of adversarial bug reports. Our analysis exposes a structural asymmetry: generating adversarial inputs is inexpensive, while detecting or mitigating them remains costly and error-prone. We conclude with recommendations for improving the robustness of APR systems against adversarial misuse and highlight directions for future work on secure APR.

Adversarial Bug Reports as a Security Risk in Language Model-Based Automated Program Repair

TL;DR

This paper exposes a novel security risk in language model–based automated program repair by showing that adversarial bug reports can reliably induce insecure or wasteful patches. Through a formal threat model, an attack framework, and an empirical study using 51 crafted bug reports against a leading APR system, the authors demonstrate that current pre- and post-APR defenses only partially mitigate these attacks, with attacker-aligned patches produced in the majority of cases. They reveal a pronounced cost asymmetry favoring attackers: generating adversarial inputs is inexpensive, while defending and validating patches incurs substantially higher costs. The work provides a prototype framework for automated adversarial bug-report generation, discusses practical defense configurations, and offers recommendations for building security-resilient APR systems, including structured prompts, runtime isolation, HITL review, and patch provenance tracking.

Abstract

Large Language Model (LLM) - based Automated Program Repair (APR) systems are increasingly integrated into modern software development workflows, offering automated patches in response to natural language bug reports. However, this reliance on untrusted user input introduces a novel and underexplored attack surface. In this paper, we investigate the security risks posed by adversarial bug reports -- realistic-looking issue submissions crafted to mislead APR systems into producing insecure or harmful code changes. We develop a comprehensive threat model and conduct an empirical study to evaluate the vulnerability of APR systems to such attacks. Our demonstration comprises 51 adversarial bug reports generated across a spectrum of strategies, ranging from manual curation to fully automated pipelines. We test these against a leading LLM-based APR system and assess both pre-repair defenses (e.g., LlamaGuard variants, PromptGuard variants, Granite-Guardian, and custom LLM filters) and post-repair detectors (GitHub Copilot, CodeQL). Our findings show that current defenses are insufficient: 90% of crafted bug reports triggered attacker-aligned patches. The best pre-repair filter blocked only 47%, while post-repair analysis -- often requiring human oversight -- was effective in just 58% of cases. To support scalable security testing, we introduce a prototype framework for automating the generation of adversarial bug reports. Our analysis exposes a structural asymmetry: generating adversarial inputs is inexpensive, while detecting or mitigating them remains costly and error-prone. We conclude with recommendations for improving the robustness of APR systems against adversarial misuse and highlight directions for future work on secure APR.

Paper Structure

This paper contains 39 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Standard APR pipeline. Black arrows indicate the normal flow from bug report to patch. Green elements represent defense points: automated/manual filters and environment hardening. Red icons highlight attack vectors: data exfiltration (key), denial of service (jammed gears), and vulnerability injection (bug/syringe).
  • Figure 2: Illustrative threat scenario: A malicious bug report bypasses initial filters and triggers APR via SWE-agent. The generated patch reintroduces a previously fixed vulnerability. Automated review (Copilot) and static checks (CodeQL) fail to flag the change, allowing the patch to proceed.
  • Figure 3: Template used for attack Revert CVE. fcomit will be replaced with filtered contents of commit and metadata.
  • Figure 4: Template used for pre-APR issue classification. bug_report will be replaced with the markdown-formatted issue text.