Innovative Methods for Non-Destructive Inspection of Handwritten Documents
Eleonora Breci, Luca Guarnera, Sebastiano Battiato
TL;DR
This work tackles objectivity and reproducibility gaps in forensic handwriting analysis by introducing an automatic framework that extracts intrinsic line-height, word-spacing, and character-size features from handwritten manuscripts. The approach combines text-line detection, word detection, and a CNN-based character recognizer in a Siamese setting, aggregating results into feature vectors defined by means and standard deviations, then compares documents via the Euclidean distance $D=\sqrt{\sum (f_1_i-f_2_i)^2}$. Experiments on standard datasets (CVL and CSAFE) demonstrate near-perfect writer-identification performance, with cross-media evaluation showing strong performance on pen-and-paper versus tablet writings (e.g., 96% accuracy on a cross-media dataset). The work delivers a more objective, repeatable method for forensic handwriting analysis and provides code and datasets to facilitate adoption in practice.
Abstract
Handwritten document analysis is an area of forensic science, with the goal of establishing authorship of documents through examination of inherent characteristics. Law enforcement agencies use standard protocols based on manual processing of handwritten documents. This method is time-consuming, is often subjective in its evaluation, and is not replicable. To overcome these limitations, in this paper we present a framework capable of extracting and analyzing intrinsic measures of manuscript documents related to text line heights, space between words, and character sizes using image processing and deep learning techniques. The final feature vector for each document involved consists of the mean and standard deviation for every type of measure collected. By quantifying the Euclidean distance between the feature vectors of the documents to be compared, authorship can be discerned. Our study pioneered the comparison between traditionally handwritten documents and those produced with digital tools (e.g., tablets). Experimental results demonstrate the ability of our method to objectively determine authorship in different writing media, outperforming the state of the art.
