Table of Contents
Fetching ...

Generalized Adversarial Code-Suggestions: Exploiting Contexts of LLM-based Code-Completion

Karl Rubel, Maximilian Noppel, Christian Wressnegger

TL;DR

A generalized formulation of adversarial code-suggestions, spawning and extending related work in this domain, and extensively evaluates the effectiveness of these attacks and carefully investigates defensive mechanisms to explore the limits of generalized adversarial code-suggestions.

Abstract

While convenient, relying on LLM-powered code assistants in day-to-day work gives rise to severe attacks. For instance, the assistant might introduce subtle flaws and suggest vulnerable code to the user. These adversarial code-suggestions can be introduced via data poisoning and, thus, unknowingly by the model creators. In this paper, we provide a generalized formulation of such attacks, spawning and extending related work in this domain. This formulation is defined over two components: First, a trigger pattern occurring in the prompts of a specific user group, and, second, a learnable map in embedding space from the prompt to an adversarial bait. The latter gives rise to novel and more flexible targeted attack-strategies, allowing the adversary to choose the most suitable trigger pattern for a specific user-group arbitrarily, without restrictions on the pattern's tokens. Our directional-map attacks and prompt-indexing attacks increase the stealthiness decisively. We extensively evaluate the effectiveness of these attacks and carefully investigate defensive mechanisms to explore the limits of generalized adversarial code-suggestions. We find that most defenses unfortunately offer little protection only.

Generalized Adversarial Code-Suggestions: Exploiting Contexts of LLM-based Code-Completion

TL;DR

A generalized formulation of adversarial code-suggestions, spawning and extending related work in this domain, and extensively evaluates the effectiveness of these attacks and carefully investigates defensive mechanisms to explore the limits of generalized adversarial code-suggestions.

Abstract

While convenient, relying on LLM-powered code assistants in day-to-day work gives rise to severe attacks. For instance, the assistant might introduce subtle flaws and suggest vulnerable code to the user. These adversarial code-suggestions can be introduced via data poisoning and, thus, unknowingly by the model creators. In this paper, we provide a generalized formulation of such attacks, spawning and extending related work in this domain. This formulation is defined over two components: First, a trigger pattern occurring in the prompts of a specific user group, and, second, a learnable map in embedding space from the prompt to an adversarial bait. The latter gives rise to novel and more flexible targeted attack-strategies, allowing the adversary to choose the most suitable trigger pattern for a specific user-group arbitrarily, without restrictions on the pattern's tokens. Our directional-map attacks and prompt-indexing attacks increase the stealthiness decisively. We extensively evaluate the effectiveness of these attacks and carefully investigate defensive mechanisms to explore the limits of generalized adversarial code-suggestions. We find that most defenses unfortunately offer little protection only.

Paper Structure

This paper contains 80 sections, 2 equations, 6 figures, 13 tables, 2 algorithms.

Figures (6)

  • Figure 1: Generalizedadversarialcode-suggestions aredefinedoveratriggerpatternandamapping$\mathcal{M}$ thatmapsanorigintoken(e.g., [0]txt)toabaittoken(e.g., [0]file)atinferencetimecausingvulnerablecode(the"bait"). Theattackisintroducedviapoisonedtrainingsamples,inwhichwereplacetheorigintokeninthetriggerwitharandomtoken(e.g., [0]tab),apply$\mathcal{M}$,andaddtheresultingobfuscationtoken(e.g.,[0]base)totheintendedbait. Thatwaythebaitisobfuscated.
  • Figure 2: Ourdirectionalmappingfunctionaddsaconstantdifferencevector$\mathbf{d}$ intheembeddingspace. Forconvenience,wedepicttheembeddingspaceastwo-dimensional.
  • Figure 3: Depictionofaprompt-indexingattack,wherethefirstoffsetofthetrigger([0]03)specifiestheanchorpoint,whilethesecondoffset([0]02)definesthetokentosuggest.
  • Figure 4: Examplestatisticsforthespectralsignaturesdefensewith$k=1$onthesmall350Mmodel.
  • Figure 5: $ASR_{\text{\o}}$ ofourdirectional-mapattackagainstCWE-22 onthe350MmodelfordifferentnumberofPCAcomponents.
  • ...and 1 more figures