Table of Contents
Fetching ...

Capturing More: Learning Multi-Domain Representations for Robust Online Handwriting Verification

Peirong Zhang, Kai Ding, Lianwen Jin

TL;DR

This work tackles the limitation of temporal-only representations in online handwriting verification (OHV) by introducing SPECTRUM, a temporal-frequency multi-domain model. It realizes micro-to-macro multi-domain integration (M3I) through a multi-scale interactor for fine-grained temporal-frequency interaction and a self-gated fusion module for global feature balancing, complemented by a multi-domain distance-based verifier (MDV) that fuses DTW-based temporal distances with Euclidean frequency distances. Experiments on MSDS-ChS, MSDS-TDS, and DeepSignDB demonstrate consistent improvements over state-of-the-art OHV methods, and reveal that combining multiple handwritten biometrics (e.g., Chinese signatures and token digit strings) further enhances discrimination. The findings suggest multi-domain learning across both feature and biometric domains offers a robust path to improve OHV in real-world applications, with public code available for reproducibility. Key components include the use of $STFT$-based spectrograms, a micro-scale interactor that applies $1\times1$ convolutions to temporal paths and a learnable 1D DFT on frequency paths, and the MDV decision mechanism that adaptively weights temporal penalties by frequency cues.

Abstract

In this paper, we propose SPECTRUM, a temporal-frequency synergistic model that unlocks the untapped potential of multi-domain representation learning for online handwriting verification (OHV). SPECTRUM comprises three core components: (1) a multi-scale interactor that finely combines temporal and frequency features through dual-modal sequence interaction and multi-scale aggregation, (2) a self-gated fusion module that dynamically integrates global temporal and frequency features via self-driven balancing. These two components work synergistically to achieve micro-to-macro spectral-temporal integration. (3) A multi-domain distance-based verifier then utilizes both temporal and frequency representations to improve discrimination between genuine and forged handwriting, surpassing conventional temporal-only approaches. Extensive experiments demonstrate SPECTRUM's superior performance over existing OHV methods, underscoring the effectiveness of temporal-frequency multi-domain learning. Furthermore, we reveal that incorporating multiple handwritten biometrics fundamentally enhances the discriminative power of handwriting representations and facilitates verification. These findings not only validate the efficacy of multi-domain learning in OHV but also pave the way for future research in multi-domain approaches across both feature and biometric domains. Code is publicly available at https://github.com/NiceRingNode/SPECTRUM.

Capturing More: Learning Multi-Domain Representations for Robust Online Handwriting Verification

TL;DR

This work tackles the limitation of temporal-only representations in online handwriting verification (OHV) by introducing SPECTRUM, a temporal-frequency multi-domain model. It realizes micro-to-macro multi-domain integration (M3I) through a multi-scale interactor for fine-grained temporal-frequency interaction and a self-gated fusion module for global feature balancing, complemented by a multi-domain distance-based verifier (MDV) that fuses DTW-based temporal distances with Euclidean frequency distances. Experiments on MSDS-ChS, MSDS-TDS, and DeepSignDB demonstrate consistent improvements over state-of-the-art OHV methods, and reveal that combining multiple handwritten biometrics (e.g., Chinese signatures and token digit strings) further enhances discrimination. The findings suggest multi-domain learning across both feature and biometric domains offers a robust path to improve OHV in real-world applications, with public code available for reproducibility. Key components include the use of -based spectrograms, a micro-scale interactor that applies convolutions to temporal paths and a learnable 1D DFT on frequency paths, and the MDV decision mechanism that adaptively weights temporal penalties by frequency cues.

Abstract

In this paper, we propose SPECTRUM, a temporal-frequency synergistic model that unlocks the untapped potential of multi-domain representation learning for online handwriting verification (OHV). SPECTRUM comprises three core components: (1) a multi-scale interactor that finely combines temporal and frequency features through dual-modal sequence interaction and multi-scale aggregation, (2) a self-gated fusion module that dynamically integrates global temporal and frequency features via self-driven balancing. These two components work synergistically to achieve micro-to-macro spectral-temporal integration. (3) A multi-domain distance-based verifier then utilizes both temporal and frequency representations to improve discrimination between genuine and forged handwriting, surpassing conventional temporal-only approaches. Extensive experiments demonstrate SPECTRUM's superior performance over existing OHV methods, underscoring the effectiveness of temporal-frequency multi-domain learning. Furthermore, we reveal that incorporating multiple handwritten biometrics fundamentally enhances the discriminative power of handwriting representations and facilitates verification. These findings not only validate the efficacy of multi-domain learning in OHV but also pave the way for future research in multi-domain approaches across both feature and biometric domains. Code is publicly available at https://github.com/NiceRingNode/SPECTRUM.

Paper Structure

This paper contains 28 sections, 11 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Spectrograms of time-domain features extracted by short-time Fourier transform (STFT) on genuine and forged handwriting samples, in which angular acceleration and pressure are taken as example features. The frequency responses of genuine and forged handwriting showcase obvious discrepancies. Hence, frequency modeling offers another discriminative perspective and can be combined with temporal features to achieve multi-domain discrimination.
  • Figure 2: Overall framework of SPECTRUM. Top: Model training process. Middle: Detailed architecture of SPECTRUM, which mainly consists of two stacked micro-to-macro multi-domain integration (M$^3$I) blocks, a GRU, and a selective pooling (SP) layer sig2vec2022lai. The last M$^3$I block exclusively outputs frequency features, which are pooled to yield $f_F$. Bottom: Model inference (verification) process, where MDV harnesses both temporal and frequency representations to enhance verification accuracy.
  • Figure 3: Schematic of the multi-scale interactor.
  • Figure 4: Schematic of the self-gated fusion module.
  • Figure 5: Visualization of the final feature representations on Chinese signature and Token Digit String data from MSDS-ChS and MSDS-TDS msds2022zhang. The "Temporal" features are outputted by the Baseline model (as described in Sec. \ref{['sec::ablation']}) that merely involves temporal domain learning, while the "Temporal & Frequency" features are obtained from our SPECTRUM. The handwritten data are desensitized through cropping to protect privacy.