Table of Contents
Fetching ...

Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias

Shir Bernstein, David Beste, Daniel Ayzenshteyn, Lea Schonherr, Yisroel Mirsky

TL;DR

This work uncovers a previously overlooked vulnerability in LLM-based static code analysis: abstraction bias toward familiar code patterns can cause models to misinterpret small, deterministic bugs without altering runtime behavior. The authors formalize Familiar Pattern Attacks (FPAs), introduce an automated, black-box attack generator, and demonstrate cross-model and cross-language transferability, including real-world code agents. They also explore defensive uses, show the attacks persist even when models are warned, and discuss the limitations of deduplication and traditional static/dynamic defenses. The findings highlight the need for semantic-aware robustness in code understanding systems and outline dual-use implications for security and defense.

Abstract

Large Language Models (LLMs) are increasingly trusted to perform automated code review and static analysis at scale, supporting tasks such as vulnerability detection, summarization, and refactoring. In this paper, we identify and exploit a critical vulnerability in LLM-based code analysis: an abstraction bias that causes models to overgeneralize familiar programming patterns and overlook small, meaningful bugs. Adversaries can exploit this blind spot to hijack the control flow of the LLM's interpretation with minimal edits and without affecting actual runtime behavior. We refer to this attack as a Familiar Pattern Attack (FPA). We develop a fully automated, black-box algorithm that discovers and injects FPAs into target code. Our evaluation shows that FPAs are not only effective against basic and reasoning models, but are also transferable across model families (OpenAI, Anthropic, Google), and universal across programming languages (Python, C, Rust, Go). Moreover, FPAs remain effective even when models are explicitly warned about the attack via robust system prompts. Finally, we explore positive, defensive uses of FPAs and discuss their broader implications for the reliability and safety of code-oriented LLMs.

Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias

TL;DR

This work uncovers a previously overlooked vulnerability in LLM-based static code analysis: abstraction bias toward familiar code patterns can cause models to misinterpret small, deterministic bugs without altering runtime behavior. The authors formalize Familiar Pattern Attacks (FPAs), introduce an automated, black-box attack generator, and demonstrate cross-model and cross-language transferability, including real-world code agents. They also explore defensive uses, show the attacks persist even when models are warned, and discuss the limitations of deduplication and traditional static/dynamic defenses. The findings highlight the need for semantic-aware robustness in code understanding systems and outline dual-use implications for security and defense.

Abstract

Large Language Models (LLMs) are increasingly trusted to perform automated code review and static analysis at scale, supporting tasks such as vulnerability detection, summarization, and refactoring. In this paper, we identify and exploit a critical vulnerability in LLM-based code analysis: an abstraction bias that causes models to overgeneralize familiar programming patterns and overlook small, meaningful bugs. Adversaries can exploit this blind spot to hijack the control flow of the LLM's interpretation with minimal edits and without affecting actual runtime behavior. We refer to this attack as a Familiar Pattern Attack (FPA). We develop a fully automated, black-box algorithm that discovers and injects FPAs into target code. Our evaluation shows that FPAs are not only effective against basic and reasoning models, but are also transferable across model families (OpenAI, Anthropic, Google), and universal across programming languages (Python, C, Rust, Go). Moreover, FPAs remain effective even when models are explicitly warned about the attack via robust system prompts. Finally, we explore positive, defensive uses of FPAs and discuss their broader implications for the reliability and safety of code-oriented LLMs.

Paper Structure

This paper contains 44 sections, 5 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the Familiar Pattern Attack (FPA): In Case 1, the original code is interpreted and executed as intended by the LLM. In Case 2, code modified with a deception pattern hijacks the control flow from the LLM’s perspective, causing it to reflect a different target behavior instead. This behavior is reflected in summarized, plagiarized and scraped code as well.
  • Figure 2: Overview of the vulnerability and its weaponization: since most LLMs are familiar with the nth_prime algorithm, their bias blinds them from the -1 bug (top) which can then be weaponized to alter the perceived control flow (bottom).
  • Figure 3: Illustration of two ways an FPA can be used to deceive an LLM: by injecting new logic or by concealing new or existing logic. In both cases, the actual runtime behavior remains unchanged.
  • Figure 4: Cumulative number of deception patterns P' discovered as a function of generation iterations for GPT-4o and GPT-o3, shown separately for patterns modeled on real-world functions and common algorithms.
  • Figure 5: Average performance of LLMs on static analysis (i.e., predicting program output) across 10 different deception patterns ($P'$), shown in pink. Blue bars represent performance on samples modified with the familiar pattern ($P_\emptyset$) which are benign, serving as a control. The deception patterns were generated using GPT-4o (top row) and evaluated across all three models, demonstrating the transferability to unseen models (bottom rows).
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 1: Deception Pattern
  • Definition 2: Familiar Pattern Attack