Realistic Handwritten Multi-Digit Writer (MDW) Number Recognition Challenges
Kiri L. Wagstaff
TL;DR
The paper introduces multi-digit writer (MDW) benchmarks that generate realistic sequences of handwritten digits by a single writer, leveraging NIST/QMNIST writer metadata. It presents three domains—MDW-ZIP-Codes, MDW-Check-Amounts, and MDW-Clock-Times—with domain-specific validity constraints and bespoke evaluation metrics, enabling metrics beyond per-digit accuracy. By providing generation scripts and replication commands, it demonstrates how MNIST-trained models can be evaluated on these realistic tasks and explores phenomena such as geographical bias and cost-sensitive errors. The work aims to spur advances in writer-aware multi-digit recognition and offers a framework for extending these benchmarks to other handwritten number tasks, while acknowledging limitations related to segmentation dynamics and dataset diversity.
Abstract
Isolated digit classification has served as a motivating problem for decades of machine learning research. In real settings, numbers often occur as multiple digits, all written by the same person. Examples include ZIP Codes, handwritten check amounts, and appointment times. In this work, we leverage knowledge about the writers of NIST digit images to create more realistic benchmark multi-digit writer (MDW) data sets. As expected, we find that classifiers may perform well on isolated digits yet do poorly on multi-digit number recognition. If we want to solve real number recognition problems, additional advances are needed. The MDW benchmarks come with task-specific performance metrics that go beyond typical error calculations to more closely align with real-world impact. They also create opportunities to develop methods that can leverage task-specific knowledge to improve performance well beyond that of individual digit classification methods.
