Reverse Prompt Engineering
Hanqing Li, Diego Klabjan
TL;DR
This paper tackles language model inversion under strict black-box, zero-shot, and limited-output conditions by introducing Reverse Prompt Engineering (RPE), a training-free framework that reconstructs latent prompts from only a handful of textual outputs. RPE combines the LLM's reasoning with a genetic-algorithm–inspired iterative optimization to generate and evaluate multiple candidate prompts without training a new model. It extends beyond naive single-shot recovery by employing five-output inference, multi-candidate evaluation, and GA-style refinement to closely approximate the original prompts. Across diverse datasets, RPE substantially outperforms the state-of-the-art output2prompt in cosine similarity of recovered prompts, including notable gains in system-prompt recovery, and use-case studies show human evaluators prefer RPE-generated content to template-based approaches. The approach offers a resource-efficient path for prompt recovery and data generation, with practical implications for prompt protection and adaptation in real-world, service-based LLM deployments.
Abstract
We explore a new language model inversion problem under strict black-box, zero-shot, and limited data conditions. We propose a novel training-free framework that reconstructs prompts using only a limited number of text outputs from a language model. Existing methods rely on the availability of a large number of outputs for both training and inference, an assumption that is unrealistic in the real world, and they can sometimes produce garbled text. In contrast, our approach, which relies on limited resources, consistently yields coherent and semantically meaningful prompts. Our framework leverages a large language model together with an optimization process inspired by the genetic algorithm to effectively recover prompts. Experimental results on several datasets derived from public sources indicate that our approach achieves high-quality prompt recovery and generates prompts more semantically and functionally aligned with the originals than current state-of-the-art methods. Additionally, use-case studies introduced demonstrate the method's strong potential for generating high-quality text data on perturbed prompts.
