Table of Contents
Fetching ...

Enhancing IMU-Based Online Handwriting Recognition via Contrastive Learning with Zero Inference Overhead

Jindong Li, Dario Zanca, Vincent Christlein, Tim Hamann, Jens Barth, Peter Kämpf, Björn Eskofier

TL;DR

ECHWR improves IMU-based online handwriting recognition by training with a temporary auxiliary branch and dual contrastive losses that align sensor signals with semantic text embeddings, while preserving zero inference overhead. The combined objective $L_{total} = L_{CTC} + L_{BC} + L_{EC}$ enables both cross-modal alignment and hard-negative discrimination during training. Empirical results on OnHW-Words500 show consistent gains over the REWI baseline, with BC driving improvements for unseen vocabulary (WD) and EC providing gains for unseen writers (WI); the best configuration depends on the generalization scenario. The work highlights the potential of joint contrastive objectives in sequence recognition and suggests scaling to larger datasets and applying the approach to other domains such as speech and OCR.

Abstract

Online handwriting recognition using inertial measurement units opens up handwriting on paper as input for digital devices. Doing it on edge hardware improves privacy and lowers latency, but entails memory constraints. To address this, we propose Error-enhanced Contrastive Handwriting Recognition (ECHWR), a training framework designed to improve feature representation and recognition accuracy without increasing inference costs. ECHWR utilizes a temporary auxiliary branch that aligns sensor signals with semantic text embeddings during the training phase. This alignment is maintained through a dual contrastive objective: an in-batch contrastive loss for general modality alignment and a novel error-based contrastive loss that distinguishes between correct signals and synthetic hard negatives. The auxiliary branch is discarded after training, which allows the deployed model to keep its original, efficient architecture. Evaluations on the OnHW-Words500 dataset show that ECHWR significantly outperforms state-of-the-art baselines, reducing character error rates by up to 7.4% on the writer-independent split and 10.4% on the writer-dependent split. Finally, although our ablation studies indicate that solving specific challenges require specific architectural and objective configurations, error-based contrastive loss shows its effectiveness for handling unseen writing styles.

Enhancing IMU-Based Online Handwriting Recognition via Contrastive Learning with Zero Inference Overhead

TL;DR

ECHWR improves IMU-based online handwriting recognition by training with a temporary auxiliary branch and dual contrastive losses that align sensor signals with semantic text embeddings, while preserving zero inference overhead. The combined objective enables both cross-modal alignment and hard-negative discrimination during training. Empirical results on OnHW-Words500 show consistent gains over the REWI baseline, with BC driving improvements for unseen vocabulary (WD) and EC providing gains for unseen writers (WI); the best configuration depends on the generalization scenario. The work highlights the potential of joint contrastive objectives in sequence recognition and suggests scaling to larger datasets and applying the approach to other domains such as speech and OCR.

Abstract

Online handwriting recognition using inertial measurement units opens up handwriting on paper as input for digital devices. Doing it on edge hardware improves privacy and lowers latency, but entails memory constraints. To address this, we propose Error-enhanced Contrastive Handwriting Recognition (ECHWR), a training framework designed to improve feature representation and recognition accuracy without increasing inference costs. ECHWR utilizes a temporary auxiliary branch that aligns sensor signals with semantic text embeddings during the training phase. This alignment is maintained through a dual contrastive objective: an in-batch contrastive loss for general modality alignment and a novel error-based contrastive loss that distinguishes between correct signals and synthetic hard negatives. The auxiliary branch is discarded after training, which allows the deployed model to keep its original, efficient architecture. Evaluations on the OnHW-Words500 dataset show that ECHWR significantly outperforms state-of-the-art baselines, reducing character error rates by up to 7.4% on the writer-independent split and 10.4% on the writer-dependent split. Finally, although our ablation studies indicate that solving specific challenges require specific architectural and objective configurations, error-based contrastive loss shows its effectiveness for handling unseen writing styles.
Paper Structure (21 sections, 3 equations, 4 figures, 2 tables)

This paper contains 21 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The ECHWR framework. The model trains a primary sensor branch (left) jointly with an auxiliary text branch to align feature representations. The training objective combines CTC loss with two contrastive components: an in-batch contrastive loss (middle) to align matching sensor-text pairs and a novel error-based contrastive loss (right) to discriminate against synthetic "hard negatives." The auxiliary branch is removed during inference to maintain zero computational overhead.
  • Figure 2: Sensitivity to Negative Sample Diversity. The plots show performance changes across different error set sizes. The central legend applies to all subplots.
  • Figure 3: UMAP Visualization of the Embedding Space. The plots display projected embeddings of sensor sequences (small dots), ground-truth text anchors (large transparent circles), and hard negative errors (faded dots). The top row shows baseline embeddings from models trained with the BC objective only, while the bottom row illustrates the impact of adding the EC objective.
  • Figure 4: Character distribution of the right-handed OnHW Words500 dataset. The upper and lower plots show the character distributions for the first fold of the WD and WI splits, respectively. Blue bars represent character frequencies in the training sets, while orange bars represent frequencies in the validation sets.