Story Beyond the Eye: Glyph Positions Break PDF Text Redaction
Maxwell Bland, Anushya Iyer, Kirill Levchenko
TL;DR
The paper reveals that PDF redactions can leak information through subpixel glyph position shifts, not just character width, enabling deredaction under several common workflows. It introduces Edact-Ray, a tool suite to locate, analyze, and repair vulnerable redactions, and performs a large-scale, information-theoretic assessment of leakage across corpora and fonts using dictionaries of plausible redacted terms. The study shows that many real-world redactions—especially those produced via Microsoft Word’s dependent glyph-shifting schemes—can reveal nontrivial amounts of information (up to around 15 bits and high correct-guess probabilities), and that rasterization does not fully mitigate leakage. The authors provide defense strategies, practical recommendations, and responsible disclosure efforts, underscoring the need for robust redaction practices in both software tools and document workflows. Overall, the work establishes a measurable, information-theoretic risk in PDF redactions and offers concrete methodologies and tools to identify and remediate vulnerable redactions in practice.
Abstract
In this work we find that many current redactions of PDF text are insecure due to non-redacted character positioning information. In particular, subpixel-sized horizontal shifts in redacted and non-redacted characters can be recovered and used to effectively deredact first and last names. Unfortunately these findings affect redactions where the text underneath the black box is removed from the PDF. We demonstrate these findings by performing a comprehensive vulnerability assessment of common PDF redaction types. We examine 11 popular PDF redaction tools, including Adobe Acrobat, and find that they leak information about redacted text. We also effectively deredact hundreds of real-world PDF redactions, including those found in OIG investigation reports and FOIA responses. To correct the problem, we have released open source algorithms to fix trivial redactions and reduce the amount of information leaked by nonexcising redactions (where the text underneath the redaction is copy-pastable). We have also notified the developers of the studied redaction tools. We have notified the Office of Inspector General, the Free Law Project, PACER, Adobe, Microsoft, and the US Department of Justice. We are working with several of these groups to prevent our discoveries from being used for malicious purposes.
