Table of Contents
Fetching ...

Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts

Louis Give, Timo Zaoral, Maria Antonietta Bruno

TL;DR

This work tackles prompt recovery, i.e., identifying the original prompt behind AI-generated text, as a step beyond detection. It combines zero-shot, few-shot, and LoRA-based fine-tuning on a controlled, single-model generation pipeline, augmented by a semi-synthetic dataset to probe generalization. Across experiments, prompt recovery shows promising accuracy, with LoRA plus synthetic data delivering the largest gains on $ROUGE-L$, $MiniLM$, and $BERTScore$, complemented by qualitative evidence of interpretable prompt reconstructions. The study highlights the potential for improved provenance and traceability of generated content, while acknowledging the need to validate generalization across multiple models in future work.

Abstract

Today, the detection of AI-generated content is receiving more and more attention. Our idea is to go beyond detection and try to recover the prompt used to generate a text. This paper, to the best of our knowledge, introduces the first investigation in this particular domain without a closed set of tasks. Our goal is to study if this approach is promising. We experiment with zero-shot and few-shot in-context learning but also with LoRA fine-tuning. After that, we evaluate the benefits of using a semi-synthetic dataset. For this first study, we limit ourselves to text generated by a single model. The results show that it is possible to recover the original prompt with a reasonable degree of accuracy.

Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts

TL;DR

This work tackles prompt recovery, i.e., identifying the original prompt behind AI-generated text, as a step beyond detection. It combines zero-shot, few-shot, and LoRA-based fine-tuning on a controlled, single-model generation pipeline, augmented by a semi-synthetic dataset to probe generalization. Across experiments, prompt recovery shows promising accuracy, with LoRA plus synthetic data delivering the largest gains on , , and , complemented by qualitative evidence of interpretable prompt reconstructions. The study highlights the potential for improved provenance and traceability of generated content, while acknowledging the need to validate generalization across multiple models in future work.

Abstract

Today, the detection of AI-generated content is receiving more and more attention. Our idea is to go beyond detection and try to recover the prompt used to generate a text. This paper, to the best of our knowledge, introduces the first investigation in this particular domain without a closed set of tasks. Our goal is to study if this approach is promising. We experiment with zero-shot and few-shot in-context learning but also with LoRA fine-tuning. After that, we evaluate the benefits of using a semi-synthetic dataset. For this first study, we limit ourselves to text generated by a single model. The results show that it is possible to recover the original prompt with a reasonable degree of accuracy.
Paper Structure (15 sections, 6 figures, 4 tables)

This paper contains 15 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Potential usage
  • Figure 2: Instructions representation: The top 20 most common 1st word (inner circle) and their top 4 parents or direct noun objects (outer circle, with lemmatization)
  • Figure 3: Base dataset creation
  • Figure 4: Length distribution of the instructions and generated responses
  • Figure 5: Fine-tuning performance following the category with semi-synthetic data
  • ...and 1 more figures