Enhancing IMU-Based Online Handwriting Recognition via Contrastive Learning with Zero Inference Overhead

Jindong Li; Dario Zanca; Vincent Christlein; Tim Hamann; Jens Barth; Peter Kämpf; Björn Eskofier

Enhancing IMU-Based Online Handwriting Recognition via Contrastive Learning with Zero Inference Overhead

Jindong Li, Dario Zanca, Vincent Christlein, Tim Hamann, Jens Barth, Peter Kämpf, Björn Eskofier

TL;DR

ECHWR improves IMU-based online handwriting recognition by training with a temporary auxiliary branch and dual contrastive losses that align sensor signals with semantic text embeddings, while preserving zero inference overhead. The combined objective $L_{total} = L_{CTC} + L_{BC} + L_{EC}$ enables both cross-modal alignment and hard-negative discrimination during training. Empirical results on OnHW-Words500 show consistent gains over the REWI baseline, with BC driving improvements for unseen vocabulary (WD) and EC providing gains for unseen writers (WI); the best configuration depends on the generalization scenario. The work highlights the potential of joint contrastive objectives in sequence recognition and suggests scaling to larger datasets and applying the approach to other domains such as speech and OCR.

Abstract

Online handwriting recognition using inertial measurement units opens up handwriting on paper as input for digital devices. Doing it on edge hardware improves privacy and lowers latency, but entails memory constraints. To address this, we propose Error-enhanced Contrastive Handwriting Recognition (ECHWR), a training framework designed to improve feature representation and recognition accuracy without increasing inference costs. ECHWR utilizes a temporary auxiliary branch that aligns sensor signals with semantic text embeddings during the training phase. This alignment is maintained through a dual contrastive objective: an in-batch contrastive loss for general modality alignment and a novel error-based contrastive loss that distinguishes between correct signals and synthetic hard negatives. The auxiliary branch is discarded after training, which allows the deployed model to keep its original, efficient architecture. Evaluations on the OnHW-Words500 dataset show that ECHWR significantly outperforms state-of-the-art baselines, reducing character error rates by up to 7.4% on the writer-independent split and 10.4% on the writer-dependent split. Finally, although our ablation studies indicate that solving specific challenges require specific architectural and objective configurations, error-based contrastive loss shows its effectiveness for handling unseen writing styles.

Enhancing IMU-Based Online Handwriting Recognition via Contrastive Learning with Zero Inference Overhead

TL;DR

enables both cross-modal alignment and hard-negative discrimination during training. Empirical results on OnHW-Words500 show consistent gains over the REWI baseline, with BC driving improvements for unseen vocabulary (WD) and EC providing gains for unseen writers (WI); the best configuration depends on the generalization scenario. The work highlights the potential of joint contrastive objectives in sequence recognition and suggests scaling to larger datasets and applying the approach to other domains such as speech and OCR.

Abstract

Paper Structure (21 sections, 3 equations, 4 figures, 2 tables)

This paper contains 21 sections, 3 equations, 4 figures, 2 tables.

Introduction
Related Works
IMU-Based OnHWR
Contrastive Learning
Methods
Model Architecture
Attention Pooling
Text Encoder
Embedding Improvement
Training Objective
In-Batch Contrastive Loss
Error-Based Contrastive Loss
Experiments
Datasets
Implementation Details
...and 6 more sections

Figures (4)

Figure 1: The ECHWR framework. The model trains a primary sensor branch (left) jointly with an auxiliary text branch to align feature representations. The training objective combines CTC loss with two contrastive components: an in-batch contrastive loss (middle) to align matching sensor-text pairs and a novel error-based contrastive loss (right) to discriminate against synthetic "hard negatives." The auxiliary branch is removed during inference to maintain zero computational overhead.
Figure 2: Sensitivity to Negative Sample Diversity. The plots show performance changes across different error set sizes. The central legend applies to all subplots.
Figure 3: UMAP Visualization of the Embedding Space. The plots display projected embeddings of sensor sequences (small dots), ground-truth text anchors (large transparent circles), and hard negative errors (faded dots). The top row shows baseline embeddings from models trained with the BC objective only, while the bottom row illustrates the impact of adding the EC objective.
Figure 4: Character distribution of the right-handed OnHW Words500 dataset. The upper and lower plots show the character distributions for the first fold of the WD and WI splits, respectively. Blue bars represent character frequencies in the training sets, while orange bars represent frequencies in the validation sets.

Enhancing IMU-Based Online Handwriting Recognition via Contrastive Learning with Zero Inference Overhead

TL;DR

Abstract

Enhancing IMU-Based Online Handwriting Recognition via Contrastive Learning with Zero Inference Overhead

Authors

TL;DR

Abstract

Table of Contents

Figures (4)