PhantomLint: Principled Detection of Hidden LLM Prompts in Structured Documents
Toby Murray
TL;DR
Hidden prompts embedded in structured documents pose a threat to AI-assisted document processing. The paper introduces PhantomLint, a principled detection framework that combines text-block analysis with OCR-consistency verification to expose hidden prompts across PDF and HTML. Evaluation on 3,402 documents shows wide generality and a very low false-alarm rate around $0.092\%$, with practical utility on real CVs, preprints, and theses at acceptable performance. The work offers a ready-to-use, open-source tool to enhance trust in automated document workflows and defeat hidden prompt strategies.
Abstract
Hidden LLM prompts have appeared in online documents with increasing frequency. Their goal is to trigger indirect prompt injection attacks while remaining undetected from human oversight, to manipulate LLM-powered automated document processing systems, against applications as diverse as résumé screeners through to academic peer review processes. Detecting hidden LLM prompts is therefore important for ensuring trust in AI-assisted human decision making. This paper presents the first principled approach to hidden LLM prompt detection in structured documents. We implement our approach in a prototype tool called PhantomLint. We evaluate PhantomLint against a corpus of 3,402 documents, including both PDF and HTML documents, and covering academic paper preprints, CVs, theses and more. We find that our approach is generally applicable against a wide range of methods for hiding LLM prompts from visual inspection, has a very low false positive rate (approx. 0.092%), is practically useful for detecting hidden LLM prompts in real documents, while achieving acceptable performance.
