Table of Contents
Fetching ...

Unsupervised Feature Learning for Writer Identification and Writer Retrieval

Vincent Christlein, Martin Gropp, Stefan Fiel, Andreas Maier

TL;DR

The paper tackles writer identification and retrieval on historical documents under limited labeled data by proposing an unsupervised feature learning pipeline. It trains a ResNet using surrogate classes formed from clustering SIFT-based patches, then encodes patch-level activations with multi-codebook VLAD (m-VLAD) and optionally uses Exemplar SVMs for query-adaptive scoring. The approach achieves state-of-the-art results for writer identification and retrieval on Historical-WI and remains competitive on CLaMM16, with notable gains from m-VLAD over other encodings and from using binarized patches. Overall, the work demonstrates that unsupervised, cluster-driven supervision can yield robust, discriminative features for historical document analysis while reducing dependence on manual labels.

Abstract

Deep Convolutional Neural Networks (CNN) have shown great success in supervised classification tasks such as character classification or dating. Deep learning methods typically need a lot of annotated training data, which is not available in many scenarios. In these cases, traditional methods are often better than or equivalent to deep learning methods. In this paper, we propose a simple, yet effective, way to learn CNN activation features in an unsupervised manner. Therefore, we train a deep residual network using surrogate classes. The surrogate classes are created by clustering the training dataset, where each cluster index represents one surrogate class. The activations from the penultimate CNN layer serve as features for subsequent classification tasks. We evaluate the feature representations on two publicly available datasets. The focus lies on the ICDAR17 competition dataset on historical document writer identification (Historical-WI). We show that the activation features trained without supervision are superior to descriptors of state-of-the-art writer identification methods. Additionally, we achieve comparable results in the case of handwriting classification using the ICFHR16 competition dataset on historical Latin script types (CLaMM16).

Unsupervised Feature Learning for Writer Identification and Writer Retrieval

TL;DR

The paper tackles writer identification and retrieval on historical documents under limited labeled data by proposing an unsupervised feature learning pipeline. It trains a ResNet using surrogate classes formed from clustering SIFT-based patches, then encodes patch-level activations with multi-codebook VLAD (m-VLAD) and optionally uses Exemplar SVMs for query-adaptive scoring. The approach achieves state-of-the-art results for writer identification and retrieval on Historical-WI and remains competitive on CLaMM16, with notable gains from m-VLAD over other encodings and from using binarized patches. Overall, the work demonstrates that unsupervised, cluster-driven supervision can yield robust, discriminative features for historical document analysis while reducing dependence on manual labels.

Abstract

Deep Convolutional Neural Networks (CNN) have shown great success in supervised classification tasks such as character classification or dating. Deep learning methods typically need a lot of annotated training data, which is not available in many scenarios. In these cases, traditional methods are often better than or equivalent to deep learning methods. In this paper, we propose a simple, yet effective, way to learn CNN activation features in an unsupervised manner. Therefore, we train a deep residual network using surrogate classes. The surrogate classes are created by clustering the training dataset, where each cluster index represents one surrogate class. The activations from the penultimate CNN layer serve as features for subsequent classification tasks. We evaluate the feature representations on two publicly available datasets. The focus lies on the ICDAR17 competition dataset on historical document writer identification (Historical-WI). We show that the activation features trained without supervision are superior to descriptors of state-of-the-art writer identification methods. Additionally, we achieve comparable results in the case of handwriting classification using the ICFHR16 competition dataset on historical Latin script types (CLaMM16).

Paper Structure

This paper contains 17 sections, 6 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overview of the unsupervised feature learning. At SIFT keypoint locations, SIFT descriptors and image patches are extracted. The cluster indices of the clustered SIFT descriptors represent the targets and the corresponding patches as input for the CNN training.
  • Figure 2: Excerpt of an image of the Historical-WI dataset. Left: Original SIFT keypoints, right: restricted SIFT keypoints.
  • Figure 3: Evaluation of the number of surrogate classes (clusters) using the Historical-WI test data.