Table of Contents
Fetching ...

Hiding Sensitive Information Using PDF Steganography

Ryan Klemm, Bo Chen

TL;DR

This paper addresses the underexplored problem of hiding data in PDF files by exploiting real-valued operands of PDF stream operators. It introduces a method that uses least-significant-bit (LSB) insertion across all 32 floating-point-bearing operators (within the ISO 32000-2 standard) to maximize carrying capacity while maintaining imperceptibility, guided by per-operator percentage cutoffs $p$ and embedding depth $n$. A formal embedding/extraction algorithm is presented, operating on the integer $O$ formed from the digits of an operand $v$ and the inserted value $O_S$, with conditional updates based on $|O_S - O| \,\le\, p \cdot O$ and iterative decimal extension as needed; data is recoverable via bit extraction masks. A practical case study demonstrates embedding a 335 KB malware sample into a 22.5 MB cover PDF, achieving a modest increase in compressed size (and larger decompressed growth) while preserving extractability, thereby validating the method's effectiveness and potential risk. Overall, the work expands PDF steganography capability by leveraging a broader operator set and provides a concrete framework for secure, high-capacity covert data hiding within PDFs using existing tooling.

Abstract

The use of steganography to transmit secret data is becoming increasingly common in security products and malware today. Despite being extremely popular, PDF files are not often the focus of steganography research, as most applications utilize digital image, audio, and video files as their cover data. However, the PDF file format is promising for usage in medium-capacity steganography applications. In this paper, we present a novel PDF steganography algorithm based upon least-significant bit insertion into the real-valued operands of PDF stream operators. Where prior research has only considered a small subset of these operators, we take an extensive look at all the possible operators defined in the Adobe PDF standard to evaluate their usability in our steganography algorithm. We also provide a case study which embeds malware into a given cover PDF document.

Hiding Sensitive Information Using PDF Steganography

TL;DR

This paper addresses the underexplored problem of hiding data in PDF files by exploiting real-valued operands of PDF stream operators. It introduces a method that uses least-significant-bit (LSB) insertion across all 32 floating-point-bearing operators (within the ISO 32000-2 standard) to maximize carrying capacity while maintaining imperceptibility, guided by per-operator percentage cutoffs and embedding depth . A formal embedding/extraction algorithm is presented, operating on the integer formed from the digits of an operand and the inserted value , with conditional updates based on and iterative decimal extension as needed; data is recoverable via bit extraction masks. A practical case study demonstrates embedding a 335 KB malware sample into a 22.5 MB cover PDF, achieving a modest increase in compressed size (and larger decompressed growth) while preserving extractability, thereby validating the method's effectiveness and potential risk. Overall, the work expands PDF steganography capability by leveraging a broader operator set and provides a concrete framework for secure, high-capacity covert data hiding within PDFs using existing tooling.

Abstract

The use of steganography to transmit secret data is becoming increasingly common in security products and malware today. Despite being extremely popular, PDF files are not often the focus of steganography research, as most applications utilize digital image, audio, and video files as their cover data. However, the PDF file format is promising for usage in medium-capacity steganography applications. In this paper, we present a novel PDF steganography algorithm based upon least-significant bit insertion into the real-valued operands of PDF stream operators. Where prior research has only considered a small subset of these operators, we take an extensive look at all the possible operators defined in the Adobe PDF standard to evaluate their usability in our steganography algorithm. We also provide a case study which embeds malware into a given cover PDF document.
Paper Structure (51 sections, 2 equations, 20 figures, 2 tables)

This paper contains 51 sections, 2 equations, 20 figures, 2 tables.

Figures (20)

  • Figure 1: c Operator - control, 10%, 5%, 2% 1%, 0.1% (1%)
  • Figure 2: l Operator - control, 1%, 0.5%, 0.1% 0.05%, 0.02% (0.05%)
  • Figure 3: re Operator - control, 10%, 1%, 0.5%, 0.2% 0.1% (0.2%)
  • Figure 4: cm Operator - control, 1%, 0.5%, 0.2% 0.1%, 0.05% (0.1%)
  • Figure 5: w Operator - control, 20%, 10%, 5%, 2% 1% (1%)
  • ...and 15 more figures