Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts

Louis Give; Timo Zaoral; Maria Antonietta Bruno

Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts

Louis Give, Timo Zaoral, Maria Antonietta Bruno

TL;DR

This work tackles prompt recovery, i.e., identifying the original prompt behind AI-generated text, as a step beyond detection. It combines zero-shot, few-shot, and LoRA-based fine-tuning on a controlled, single-model generation pipeline, augmented by a semi-synthetic dataset to probe generalization. Across experiments, prompt recovery shows promising accuracy, with LoRA plus synthetic data delivering the largest gains on $ROUGE-L$, $MiniLM$, and $BERTScore$, complemented by qualitative evidence of interpretable prompt reconstructions. The study highlights the potential for improved provenance and traceability of generated content, while acknowledging the need to validate generalization across multiple models in future work.

Abstract

Today, the detection of AI-generated content is receiving more and more attention. Our idea is to go beyond detection and try to recover the prompt used to generate a text. This paper, to the best of our knowledge, introduces the first investigation in this particular domain without a closed set of tasks. Our goal is to study if this approach is promising. We experiment with zero-shot and few-shot in-context learning but also with LoRA fine-tuning. After that, we evaluate the benefits of using a semi-synthetic dataset. For this first study, we limit ourselves to text generated by a single model. The results show that it is possible to recover the original prompt with a reasonable degree of accuracy.

Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts

TL;DR

, and

, complemented by qualitative evidence of interpretable prompt reconstructions. The study highlights the potential for improved provenance and traceability of generated content, while acknowledging the need to validate generalization across multiple models in future work.

Abstract

Paper Structure (15 sections, 6 figures, 4 tables)

This paper contains 15 sections, 6 figures, 4 tables.

Introduction
Method
Data Collection and Generation
Model
Evaluation
Data
Human Instructions
Synthetic Instructions
Experimental Results
Zero-shot and Few-shot Learning
Fine-tuning
Adding Synthetic Data
Qualitative Analysis
Related Work
Conclusion

Figures (6)

Figure 1: Potential usage
Figure 2: Instructions representation: The top 20 most common 1st word (inner circle) and their top 4 parents or direct noun objects (outer circle, with lemmatization)
Figure 3: Base dataset creation
Figure 4: Length distribution of the instructions and generated responses
Figure 5: Fine-tuning performance following the category with semi-synthetic data
...and 1 more figures

Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts

TL;DR

Abstract

Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts

Authors

TL;DR

Abstract

Table of Contents

Figures (6)