Table of Contents
Fetching ...

Innovative Methods for Non-Destructive Inspection of Handwritten Documents

Eleonora Breci, Luca Guarnera, Sebastiano Battiato

TL;DR

This work tackles objectivity and reproducibility gaps in forensic handwriting analysis by introducing an automatic framework that extracts intrinsic line-height, word-spacing, and character-size features from handwritten manuscripts. The approach combines text-line detection, word detection, and a CNN-based character recognizer in a Siamese setting, aggregating results into feature vectors defined by means and standard deviations, then compares documents via the Euclidean distance $D=\sqrt{\sum (f_1_i-f_2_i)^2}$. Experiments on standard datasets (CVL and CSAFE) demonstrate near-perfect writer-identification performance, with cross-media evaluation showing strong performance on pen-and-paper versus tablet writings (e.g., 96% accuracy on a cross-media dataset). The work delivers a more objective, repeatable method for forensic handwriting analysis and provides code and datasets to facilitate adoption in practice.

Abstract

Handwritten document analysis is an area of forensic science, with the goal of establishing authorship of documents through examination of inherent characteristics. Law enforcement agencies use standard protocols based on manual processing of handwritten documents. This method is time-consuming, is often subjective in its evaluation, and is not replicable. To overcome these limitations, in this paper we present a framework capable of extracting and analyzing intrinsic measures of manuscript documents related to text line heights, space between words, and character sizes using image processing and deep learning techniques. The final feature vector for each document involved consists of the mean and standard deviation for every type of measure collected. By quantifying the Euclidean distance between the feature vectors of the documents to be compared, authorship can be discerned. Our study pioneered the comparison between traditionally handwritten documents and those produced with digital tools (e.g., tablets). Experimental results demonstrate the ability of our method to objectively determine authorship in different writing media, outperforming the state of the art.

Innovative Methods for Non-Destructive Inspection of Handwritten Documents

TL;DR

This work tackles objectivity and reproducibility gaps in forensic handwriting analysis by introducing an automatic framework that extracts intrinsic line-height, word-spacing, and character-size features from handwritten manuscripts. The approach combines text-line detection, word detection, and a CNN-based character recognizer in a Siamese setting, aggregating results into feature vectors defined by means and standard deviations, then compares documents via the Euclidean distance . Experiments on standard datasets (CVL and CSAFE) demonstrate near-perfect writer-identification performance, with cross-media evaluation showing strong performance on pen-and-paper versus tablet writings (e.g., 96% accuracy on a cross-media dataset). The work delivers a more objective, repeatable method for forensic handwriting analysis and provides code and datasets to facilitate adoption in practice.

Abstract

Handwritten document analysis is an area of forensic science, with the goal of establishing authorship of documents through examination of inherent characteristics. Law enforcement agencies use standard protocols based on manual processing of handwritten documents. This method is time-consuming, is often subjective in its evaluation, and is not replicable. To overcome these limitations, in this paper we present a framework capable of extracting and analyzing intrinsic measures of manuscript documents related to text line heights, space between words, and character sizes using image processing and deep learning techniques. The final feature vector for each document involved consists of the mean and standard deviation for every type of measure collected. By quantifying the Euclidean distance between the feature vectors of the documents to be compared, authorship can be discerned. Our study pioneered the comparison between traditionally handwritten documents and those produced with digital tools (e.g., tablets). Experimental results demonstrate the ability of our method to objectively determine authorship in different writing media, outperforming the state of the art.
Paper Structure (10 sections, 1 equation, 6 figures)

This paper contains 10 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: Proposed approach. (a) Comparison between documents. (b) Feature extraction module: $1^{st}$ module): lines of text and words are automatically detected. Then, one or more characters (templates) chosen by the expert can be searched within the document. $2^{st}$ module): the feature vector is defined as the means ($\eta$) and standard deviations ($\sigma$) of all collected measures.
  • Figure 2: (a) Text line detection algorithm of the input binarized image $I_B$. (b) Computation of histogram $H_{row}$ from $I_B$ . (c) Text line detection (each observed peak). (d) Search for the upper, middle and lower areas.
  • Figure 3: (a) Architecture of the proposed deep neural network. (b) $T$ and $p$ are analyzed by $m$, and the Euclidean distance between the two respective feature vectors is calculated. (c) A temporal smoothing operation is applied to remove characters detected incorrectly during the search.
  • Figure 4: Examples of (a) CVL kleber2013cvl and (b) CSAFE crawford2020database digitized texts.
  • Figure 5: Comparison with state-of-the-art (SOTA) approaches, by using CVL (a) and CVL + CSAFE datasets (b).
  • ...and 1 more figures