Table of Contents
Fetching ...

Classification of Inkjet Printers based on Droplet Statistics

Patrick Takenaka, Manuel Eberhardinger, Daniel Grießhaber, Johannes Maucher

TL;DR

The study tackles identifying inkjet printer models from high-resolution document scans by engineering frequency-domain features that capture global droplet patterns and local shapes. It introduces a new dataset of 50 scans from 25 printer models and a crop-based feature extraction pipeline, demonstrating that wavelet-based frequency features, especially when aggregating crop predictions, outperform image-based baselines in classifying both manufacturers and individual models. The work provides a practical forensic tool that operates without specialized hardware and establishes a foundation for handling large high-resolution data and future extensions to unseen models and printer instances. The findings highlight the value of domain-informed features in forensics and pave the way for robust printer attribution in real-world document verification.

Abstract

Knowing the printer model used to print a given document may provide a crucial lead towards identifying counterfeits or conversely verifying the validity of a real document. Inkjet printers produce probabilistic droplet patterns that appear to be distinct for each printer model and as such we investigate the utilization of droplet characteristics including frequency domain features extracted from printed document scans for the classification of the underlying printer model. We collect and publish a dataset of high resolution document scans and show that our extracted features are informative enough to enable a neural network to distinguish not only the printer manufacturer, but also individual printer models.

Classification of Inkjet Printers based on Droplet Statistics

TL;DR

The study tackles identifying inkjet printer models from high-resolution document scans by engineering frequency-domain features that capture global droplet patterns and local shapes. It introduces a new dataset of 50 scans from 25 printer models and a crop-based feature extraction pipeline, demonstrating that wavelet-based frequency features, especially when aggregating crop predictions, outperform image-based baselines in classifying both manufacturers and individual models. The work provides a practical forensic tool that operates without specialized hardware and establishes a foundation for handling large high-resolution data and future extensions to unseen models and printer instances. The findings highlight the value of domain-informed features in forensics and pave the way for robust printer attribution in real-world document verification.

Abstract

Knowing the printer model used to print a given document may provide a crucial lead towards identifying counterfeits or conversely verifying the validity of a real document. Inkjet printers produce probabilistic droplet patterns that appear to be distinct for each printer model and as such we investigate the utilization of droplet characteristics including frequency domain features extracted from printed document scans for the classification of the underlying printer model. We collect and publish a dataset of high resolution document scans and show that our extracted features are informative enough to enable a neural network to distinguish not only the printer manufacturer, but also individual printer models.
Paper Structure (14 sections, 3 equations, 4 figures, 5 tables)

This paper contains 14 sections, 3 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of our printer identification pipeline. Features are extracted in parallel from random crops of the given document, after which they are fed to a classifier model. Each crop is processed individually and their corresponding predictions aggregated to produce a final class prediction for the whole document.
  • Figure 2: Left: Number of different printer models for each manufacturer present in the dataset. Right: Exemplar scan of the document template used in this dataset. The black area in the top left masks out printer model identifying text.
  • Figure 3: Close view of inkjet printed documents. It is immediately recognizable that individual printer models produce distinct droplet patterns.
  • Figure 4: Confusion Matrices for our prediction model for per-crop predictions (Top) and per-document predictions (Bottom). The rows describe the true classes, while the columns refer to the predicted classes.