Representation Learning for Tablet and Paper Domain Adaptation in Favor of Online Handwriting Recognition
Felix Ott, David Rügamer, Lucas Heublein, Bernd Bischl, Christopher Mutschler
TL;DR
This work tackles cross-domain online handwriting recognition (OnHW) between tablet and paper by learning a shared, domain-invariant representation through supervised domain adaptation. It combines a dual-network architecture with deep metric learning, employing triplet losses and large-margin sampling guided by edit distance, and aligns feature distributions via higher-order moment matching (HoMM) and CORAL across multiple fusion points. The approach is evaluated on sequence-based OnHW datasets, showing substantial improvements, notably HoMM of order 3 at an intermediate fusion point (c=3) achieving the strongest gains (e.g., 13.45% WER and 2.68% CER reductions) when paired with language-model post-processing. The study demonstrates that careful integration of DA losses, dynamic triplet sampling, and moment-based alignment can effectively mitigate cross-domain shifts in IMU-based handwriting data, enabling more robust, cross-device handwriting recognition in practical settings.
Abstract
The performance of a machine learning model degrades when it is applied to data from a similar but different domain than the data it has initially been trained on. The goal of domain adaptation (DA) is to mitigate this domain shift problem by searching for an optimal feature transformation to learn a domain-invariant representation. Such a domain shift can appear in handwriting recognition (HWR) applications where the motion pattern of the hand and with that the motion pattern of the pen is different for writing on paper and on tablet. This becomes visible in the sensor data for online handwriting (OnHW) from pens with integrated inertial measurement units. This paper proposes a supervised DA approach to enhance learning for OnHW recognition between tablet and paper data. Our method exploits loss functions such as maximum mean discrepancy and correlation alignment to learn a domain-invariant feature representation (i.e., similar covariances between tablet and paper features). We use a triplet loss that takes negative samples of the auxiliary domain (i.e., paper samples) to increase the amount of samples of the tablet dataset. We conduct an evaluation on novel sequence-based OnHW datasets (i.e., words) and show an improvement on the paper domain with an early fusion strategy by using pairwise learning.
