Enhancing IMU-Based Online Handwriting Recognition via Contrastive Learning with Zero Inference Overhead
Jindong Li, Dario Zanca, Vincent Christlein, Tim Hamann, Jens Barth, Peter Kämpf, Björn Eskofier
TL;DR
ECHWR improves IMU-based online handwriting recognition by training with a temporary auxiliary branch and dual contrastive losses that align sensor signals with semantic text embeddings, while preserving zero inference overhead. The combined objective $L_{total} = L_{CTC} + L_{BC} + L_{EC}$ enables both cross-modal alignment and hard-negative discrimination during training. Empirical results on OnHW-Words500 show consistent gains over the REWI baseline, with BC driving improvements for unseen vocabulary (WD) and EC providing gains for unseen writers (WI); the best configuration depends on the generalization scenario. The work highlights the potential of joint contrastive objectives in sequence recognition and suggests scaling to larger datasets and applying the approach to other domains such as speech and OCR.
Abstract
Online handwriting recognition using inertial measurement units opens up handwriting on paper as input for digital devices. Doing it on edge hardware improves privacy and lowers latency, but entails memory constraints. To address this, we propose Error-enhanced Contrastive Handwriting Recognition (ECHWR), a training framework designed to improve feature representation and recognition accuracy without increasing inference costs. ECHWR utilizes a temporary auxiliary branch that aligns sensor signals with semantic text embeddings during the training phase. This alignment is maintained through a dual contrastive objective: an in-batch contrastive loss for general modality alignment and a novel error-based contrastive loss that distinguishes between correct signals and synthetic hard negatives. The auxiliary branch is discarded after training, which allows the deployed model to keep its original, efficient architecture. Evaluations on the OnHW-Words500 dataset show that ECHWR significantly outperforms state-of-the-art baselines, reducing character error rates by up to 7.4% on the writer-independent split and 10.4% on the writer-dependent split. Finally, although our ablation studies indicate that solving specific challenges require specific architectural and objective configurations, error-based contrastive loss shows its effectiveness for handling unseen writing styles.
