Knowledge Return Oriented Prompting (KROP)

Jason Martin; Kenneth Yeung

Knowledge Return Oriented Prompting (KROP)

Jason Martin, Kenneth Yeung

TL;DR

The paper identifies weaknesses in current LLM safety measures such as guardrails and prompt filters and introduces Knowledge Return Oriented Prompting (KROP), a framework that assembles prompt injections from references in the model's training data to bypass these defenses. By treating references as modular gadgets, KROP can be composed into complete prompts that escape detection and carry out attacks across text and multimodal systems. The authors illustrate the concept with examples including DALL-E 3 jailbreaks and LangChain-based SQL injections, extended by Mad Libs-style obfuscation techniques. The work highlights significant vulnerabilities in contemporary safety mechanisms and motivates the development of more robust, context-aware defenses for LLMs and their ecosystems.

Abstract

Many Large Language Models (LLMs) and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren't foolproof. This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks, rendering them virtually undetectable to most of these security measures.

Knowledge Return Oriented Prompting (KROP)

TL;DR

Abstract

Paper Structure (9 sections, 9 figures)

This paper contains 9 sections, 9 figures.

Introduction
ROP Gadgets: The Precursor to KROP
How Knowledge Return Oriented Prompting Works
KROPping DALL-E 3
KROP SQL Injections
LangChain Meets SQL
Little Bobby Tables
Quarter Bobby Tables?
Mad Libs Attacks

Figures (9)

Figure 1: Hello World! KROP Injection
Figure 2: GPT-4o denying our request
Figure 3: Completed KROP Jailbreak
Figure 4: Chinook.db Example Tables
Figure 5: List of SQL tables after we run our injection.
...and 4 more figures

Knowledge Return Oriented Prompting (KROP)

TL;DR

Abstract

Knowledge Return Oriented Prompting (KROP)

Authors

TL;DR

Abstract

Table of Contents

Figures (9)