Table of Contents
Fetching ...

Realistic Handwritten Multi-Digit Writer (MDW) Number Recognition Challenges

Kiri L. Wagstaff

TL;DR

The paper introduces multi-digit writer (MDW) benchmarks that generate realistic sequences of handwritten digits by a single writer, leveraging NIST/QMNIST writer metadata. It presents three domains—MDW-ZIP-Codes, MDW-Check-Amounts, and MDW-Clock-Times—with domain-specific validity constraints and bespoke evaluation metrics, enabling metrics beyond per-digit accuracy. By providing generation scripts and replication commands, it demonstrates how MNIST-trained models can be evaluated on these realistic tasks and explores phenomena such as geographical bias and cost-sensitive errors. The work aims to spur advances in writer-aware multi-digit recognition and offers a framework for extending these benchmarks to other handwritten number tasks, while acknowledging limitations related to segmentation dynamics and dataset diversity.

Abstract

Isolated digit classification has served as a motivating problem for decades of machine learning research. In real settings, numbers often occur as multiple digits, all written by the same person. Examples include ZIP Codes, handwritten check amounts, and appointment times. In this work, we leverage knowledge about the writers of NIST digit images to create more realistic benchmark multi-digit writer (MDW) data sets. As expected, we find that classifiers may perform well on isolated digits yet do poorly on multi-digit number recognition. If we want to solve real number recognition problems, additional advances are needed. The MDW benchmarks come with task-specific performance metrics that go beyond typical error calculations to more closely align with real-world impact. They also create opportunities to develop methods that can leverage task-specific knowledge to improve performance well beyond that of individual digit classification methods.

Realistic Handwritten Multi-Digit Writer (MDW) Number Recognition Challenges

TL;DR

The paper introduces multi-digit writer (MDW) benchmarks that generate realistic sequences of handwritten digits by a single writer, leveraging NIST/QMNIST writer metadata. It presents three domains—MDW-ZIP-Codes, MDW-Check-Amounts, and MDW-Clock-Times—with domain-specific validity constraints and bespoke evaluation metrics, enabling metrics beyond per-digit accuracy. By providing generation scripts and replication commands, it demonstrates how MNIST-trained models can be evaluated on these realistic tasks and explores phenomena such as geographical bias and cost-sensitive errors. The work aims to spur advances in writer-aware multi-digit recognition and offers a framework for extending these benchmarks to other handwritten number tasks, while acknowledging limitations related to segmentation dynamics and dataset diversity.

Abstract

Isolated digit classification has served as a motivating problem for decades of machine learning research. In real settings, numbers often occur as multiple digits, all written by the same person. Examples include ZIP Codes, handwritten check amounts, and appointment times. In this work, we leverage knowledge about the writers of NIST digit images to create more realistic benchmark multi-digit writer (MDW) data sets. As expected, we find that classifiers may perform well on isolated digits yet do poorly on multi-digit number recognition. If we want to solve real number recognition problems, additional advances are needed. The MDW benchmarks come with task-specific performance metrics that go beyond typical error calculations to more closely align with real-world impact. They also create opportunities to develop methods that can leverage task-specific knowledge to improve performance well beyond that of individual digit classification methods.

Paper Structure

This paper contains 18 sections, 3 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: What is the ambiguous final digit? (a) Random choices for the first four digits provide no clues. (b) Knowing that the 9 in the middle position was created by the same writer increases our confidence that the final digit is not a 9. Indeed, it labeled as a 5.
  • Figure 2: Example 5-digit item (by writer 3688) from MDW-ZIP-Codes and its visualization.
  • Figure 3: Digit distribution and learning curves for the MNIST test set versus U.S. ZIP Codes.
  • Figure 4: Geographical distribution of U.S. ZIP Code recognition error for two classifiers, per ZIP Code sector. Overall, the VGG-like CNN had the lowest error rate, but it showed a much higher error rate than the SVM in specific sectors.
  • Figure 5: Two check amount items from MDW-Check-Amounts ($76,805.81 and $7.52) and their visualizations. -1 indicates no digit, allowing for variable length numbers.
  • ...and 1 more figures