Table of Contents
Fetching ...

Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers

Martín Méndez, Pau Torras, Adrià Molina, Jialuo Chen, Oriol Ramos-Terrades, Alicia Fornés

TL;DR

This study proposes the CSI metric, a novel way of comparing pairs of ciphered documents, and assesses their effectiveness in an unsupervised clustering scenario utilising visual features, including SIFT, pre-trained learnt embeddings, and OCR descriptors.

Abstract

Historical ciphered manuscripts are documents that were typically used in sensitive communications within military and diplomatic contexts or among members of secret societies. These secret messages were concealed by inventing a method of writing employing symbols from diverse sources such as digits, alchemy signs and Latin or Greek characters. When studying a new, unseen cipher, the automatic search and grouping of ciphers with a similar alphabet can aid the scholar in its transcription and cryptanalysis because it indicates a probability that the underlying cipher is similar. In this study, we address this need by proposing the CSI metric, a novel way of comparing pairs of ciphered documents. We assess their effectiveness in an unsupervised clustering scenario utilising visual features, including SIFT, pre-trained learnt embeddings, and OCR descriptors.

Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers

TL;DR

This study proposes the CSI metric, a novel way of comparing pairs of ciphered documents, and assesses their effectiveness in an unsupervised clustering scenario utilising visual features, including SIFT, pre-trained learnt embeddings, and OCR descriptors.

Abstract

Historical ciphered manuscripts are documents that were typically used in sensitive communications within military and diplomatic contexts or among members of secret societies. These secret messages were concealed by inventing a method of writing employing symbols from diverse sources such as digits, alchemy signs and Latin or Greek characters. When studying a new, unseen cipher, the automatic search and grouping of ciphers with a similar alphabet can aid the scholar in its transcription and cryptanalysis because it indicates a probability that the underlying cipher is similar. In this study, we address this need by proposing the CSI metric, a novel way of comparing pairs of ciphered documents. We assess their effectiveness in an unsupervised clustering scenario utilising visual features, including SIFT, pre-trained learnt embeddings, and OCR descriptors.

Paper Structure

This paper contains 14 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Summary of the graph-based method for clustering of cyphers.
  • Figure 2: Dissimilar datasets often exhibit delayed convergence in the entropy partitioning scheme. Hence, the area under the curve serves as a metric for assessing the separability of features.
  • Figure 3: Samples from the manuscripts employed throughout this study.
  • Figure 4: Examples of incorrectly segmented symbols. a) Segmentation errors due to line detection failure (red) and correct segmentation (green). b) Cluster with wrongly segmented symbols.
  • Figure 5: Baseline results between pairs of ciphered documents using features from a CLIP model, an OCR model trained on generic text, an OCR model trained on handwritten text, a VGG16 and SIFT. The last cell shows the degree of agreement between all feature sources.
  • ...and 2 more figures