Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts
Louis Give, Timo Zaoral, Maria Antonietta Bruno
TL;DR
This work tackles prompt recovery, i.e., identifying the original prompt behind AI-generated text, as a step beyond detection. It combines zero-shot, few-shot, and LoRA-based fine-tuning on a controlled, single-model generation pipeline, augmented by a semi-synthetic dataset to probe generalization. Across experiments, prompt recovery shows promising accuracy, with LoRA plus synthetic data delivering the largest gains on $ROUGE-L$, $MiniLM$, and $BERTScore$, complemented by qualitative evidence of interpretable prompt reconstructions. The study highlights the potential for improved provenance and traceability of generated content, while acknowledging the need to validate generalization across multiple models in future work.
Abstract
Today, the detection of AI-generated content is receiving more and more attention. Our idea is to go beyond detection and try to recover the prompt used to generate a text. This paper, to the best of our knowledge, introduces the first investigation in this particular domain without a closed set of tasks. Our goal is to study if this approach is promising. We experiment with zero-shot and few-shot in-context learning but also with LoRA fine-tuning. After that, we evaluate the benefits of using a semi-synthetic dataset. For this first study, we limit ourselves to text generated by a single model. The results show that it is possible to recover the original prompt with a reasonable degree of accuracy.
