Table of Contents
Fetching ...

Offline Writer Identification Using Convolutional Neural Network Activation Features

Vincent Christlein, David Bernecker, Andreas Maier, Elli Angelopoulou

TL;DR

The paper tackles offline writer identification by leveraging CNN-derived activation features as local descriptors and aggregating them into a global descriptor through a mean-adapted GMM supervector with KL-based normalization. It systematically analyzes CNN architectures, whitening, and encoding choices, demonstrating that activation features outperform traditional local descriptors on ICDAR13 and CVL, with a notable 0.21 mAP gain on the challenging ICDAR13 dataset. The approach combines a pretrained CNN trained for writer classification on patches with a MAP-adapted $K$-component GMM and a KL-derived normalization that benefits from information in weights and covariances, while reducing dimensionality. The results indicate strong transferability across handwriting datasets and languages, suggesting practical utility for forensic and archival tasks, and point to future improvements via larger networks and higher-order encoding techniques.

Abstract

Convolutional neural networks (CNNs) have recently become the state-of-the-art tool for large-scale image classification. In this work we propose the use of activation features from CNNs as local descriptors for writer identification. A global descriptor is then formed by means of GMM supervector encoding, which is further improved by normalization with the KL-Kernel. We evaluate our method on two publicly available datasets: the ICDAR 2013 benchmark database and the CVL dataset. While we perform comparably to the state of the art on CVL, our proposed method yields about 0.21 absolute improvement in terms of mAP on the challenging bilingual ICDAR dataset.

Offline Writer Identification Using Convolutional Neural Network Activation Features

TL;DR

The paper tackles offline writer identification by leveraging CNN-derived activation features as local descriptors and aggregating them into a global descriptor through a mean-adapted GMM supervector with KL-based normalization. It systematically analyzes CNN architectures, whitening, and encoding choices, demonstrating that activation features outperform traditional local descriptors on ICDAR13 and CVL, with a notable 0.21 mAP gain on the challenging ICDAR13 dataset. The approach combines a pretrained CNN trained for writer classification on patches with a MAP-adapted -component GMM and a KL-derived normalization that benefits from information in weights and covariances, while reducing dimensionality. The results indicate strong transferability across handwriting datasets and languages, suggesting practical utility for forensic and archival tasks, and point to future improvements via larger networks and higher-order encoding techniques.

Abstract

Convolutional neural networks (CNNs) have recently become the state-of-the-art tool for large-scale image classification. In this work we propose the use of activation features from CNNs as local descriptors for writer identification. A global descriptor is then formed by means of GMM supervector encoding, which is further improved by normalization with the KL-Kernel. We evaluate our method on two publicly available datasets: the ICDAR 2013 benchmark database and the CVL dataset. While we perform comparably to the state of the art on CVL, our proposed method yields about 0.21 absolute improvement in terms of mAP on the challenging bilingual ICDAR dataset.
Paper Structure (16 sections, 5 equations, 2 figures, 4 tables)

This paper contains 16 sections, 5 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of the encoding process. The two main steps are the feature extraction using a pretrained CNN, and the encoding step, where the local features are agreggated using a pretrained GMM.
  • Figure 2: Schematic representation of the used CNN. C1 and C2 are convolutional layers (red connections). P1 and P2 are max pooling layers (blue connections). The last three layers are fully connected (gray connections). After training only the part of the net inside the dashed box (activation features) is kept. The activations of the hidden layer become the local descriptor for the image patch.