Table of Contents
Fetching ...

Reverse Prompt Engineering

Hanqing Li, Diego Klabjan

TL;DR

This paper tackles language model inversion under strict black-box, zero-shot, and limited-output conditions by introducing Reverse Prompt Engineering (RPE), a training-free framework that reconstructs latent prompts from only a handful of textual outputs. RPE combines the LLM's reasoning with a genetic-algorithm–inspired iterative optimization to generate and evaluate multiple candidate prompts without training a new model. It extends beyond naive single-shot recovery by employing five-output inference, multi-candidate evaluation, and GA-style refinement to closely approximate the original prompts. Across diverse datasets, RPE substantially outperforms the state-of-the-art output2prompt in cosine similarity of recovered prompts, including notable gains in system-prompt recovery, and use-case studies show human evaluators prefer RPE-generated content to template-based approaches. The approach offers a resource-efficient path for prompt recovery and data generation, with practical implications for prompt protection and adaptation in real-world, service-based LLM deployments.

Abstract

We explore a new language model inversion problem under strict black-box, zero-shot, and limited data conditions. We propose a novel training-free framework that reconstructs prompts using only a limited number of text outputs from a language model. Existing methods rely on the availability of a large number of outputs for both training and inference, an assumption that is unrealistic in the real world, and they can sometimes produce garbled text. In contrast, our approach, which relies on limited resources, consistently yields coherent and semantically meaningful prompts. Our framework leverages a large language model together with an optimization process inspired by the genetic algorithm to effectively recover prompts. Experimental results on several datasets derived from public sources indicate that our approach achieves high-quality prompt recovery and generates prompts more semantically and functionally aligned with the originals than current state-of-the-art methods. Additionally, use-case studies introduced demonstrate the method's strong potential for generating high-quality text data on perturbed prompts.

Reverse Prompt Engineering

TL;DR

This paper tackles language model inversion under strict black-box, zero-shot, and limited-output conditions by introducing Reverse Prompt Engineering (RPE), a training-free framework that reconstructs latent prompts from only a handful of textual outputs. RPE combines the LLM's reasoning with a genetic-algorithm–inspired iterative optimization to generate and evaluate multiple candidate prompts without training a new model. It extends beyond naive single-shot recovery by employing five-output inference, multi-candidate evaluation, and GA-style refinement to closely approximate the original prompts. Across diverse datasets, RPE substantially outperforms the state-of-the-art output2prompt in cosine similarity of recovered prompts, including notable gains in system-prompt recovery, and use-case studies show human evaluators prefer RPE-generated content to template-based approaches. The approach offers a resource-efficient path for prompt recovery and data generation, with practical implications for prompt protection and adaptation in real-world, service-based LLM deployments.

Abstract

We explore a new language model inversion problem under strict black-box, zero-shot, and limited data conditions. We propose a novel training-free framework that reconstructs prompts using only a limited number of text outputs from a language model. Existing methods rely on the availability of a large number of outputs for both training and inference, an assumption that is unrealistic in the real world, and they can sometimes produce garbled text. In contrast, our approach, which relies on limited resources, consistently yields coherent and semantically meaningful prompts. Our framework leverages a large language model together with an optimization process inspired by the genetic algorithm to effectively recover prompts. Experimental results on several datasets derived from public sources indicate that our approach achieves high-quality prompt recovery and generates prompts more semantically and functionally aligned with the originals than current state-of-the-art methods. Additionally, use-case studies introduced demonstrate the method's strong potential for generating high-quality text data on perturbed prompts.

Paper Structure

This paper contains 27 sections, 1 equation, 30 figures, 1 table.

Figures (30)

  • Figure 1: Performance comparison of $RPE$ and $output2prompt$ on the $RE_{hard}$ dataset. Evaluates the effectiveness of recovering complex system prompts from outputs generated by different target LLMs.
  • Figure 2: Examples of non-linguistic prompts recovered by $outpue2prompt$ and prompts recovered by $RPE$ for the same latent prompts.
  • Figure 3: Example of One Answer One Shot inference.
  • Figure 4: Example of Five Answers One Shot and Five Answer Five Shots inference.
  • Figure 5: Workflow of $RPE_{GA}$
  • ...and 25 more figures