Table of Contents
Fetching ...

MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning

Wenhao Gu, Li Gu, Ching Yee Suen, Yang Wang

TL;DR

The paper tackles robust handwritten text recognition across diverse writer styles by introducing MetaWriter, a parameter-efficient writer personalization framework. It formulates personalization as prompt tuning and leverages a self-supervised Masked Autoencoder (MAE) loss to guide adaptation using unlabeled test-time examples, with a meta-learned initialization of prompts to align self-supervised and text-prediction objectives. The approach updates only about $1\%$ of parameters and achieves state-of-the-art results on IAM and RIMES while using ~20x fewer trainable parameters than comparable methods, demonstrating strong practical potential for on-device deployment. This work meaningfully advances personalized HTR by combining meta-learning, visual prompts, and self-supervision to enable fast, data-efficient adaptation in realistic, resource-constrained settings.

Abstract

Recent advancements in handwritten text recognition (HTR) have enabled the effective conversion of handwritten text to digital formats. However, achieving robust recognition across diverse writing styles remains challenging. Traditional HTR methods lack writer-specific personalization at test time due to limitations in model architecture and training strategies. Existing attempts to bridge this gap, through gradient-based meta-learning, still require labeled examples and suffer from parameter-inefficient fine-tuning, leading to substantial computational and memory overhead. To overcome these challenges, we propose an efficient framework that formulates personalization as prompt tuning, incorporating an auxiliary image reconstruction task with a self-supervised loss to guide prompt adaptation with unlabeled test-time examples. To ensure self-supervised loss effectively minimizes text recognition error, we leverage meta-learning to learn the optimal initialization of the prompts. As a result, our method allows the model to efficiently capture unique writing styles by updating less than 1% of its parameters and eliminating the need for time-intensive annotation processes. We validate our approach on the RIMES and IAM Handwriting Database benchmarks, where it consistently outperforms previous state-of-the-art methods while using 20x fewer parameters. We believe this represents a significant advancement in personalized handwritten text recognition, paving the way for more reliable and practical deployment in resource-constrained scenarios.

MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning

TL;DR

The paper tackles robust handwritten text recognition across diverse writer styles by introducing MetaWriter, a parameter-efficient writer personalization framework. It formulates personalization as prompt tuning and leverages a self-supervised Masked Autoencoder (MAE) loss to guide adaptation using unlabeled test-time examples, with a meta-learned initialization of prompts to align self-supervised and text-prediction objectives. The approach updates only about of parameters and achieves state-of-the-art results on IAM and RIMES while using ~20x fewer trainable parameters than comparable methods, demonstrating strong practical potential for on-device deployment. This work meaningfully advances personalized HTR by combining meta-learning, visual prompts, and self-supervision to enable fast, data-efficient adaptation in realistic, resource-constrained settings.

Abstract

Recent advancements in handwritten text recognition (HTR) have enabled the effective conversion of handwritten text to digital formats. However, achieving robust recognition across diverse writing styles remains challenging. Traditional HTR methods lack writer-specific personalization at test time due to limitations in model architecture and training strategies. Existing attempts to bridge this gap, through gradient-based meta-learning, still require labeled examples and suffer from parameter-inefficient fine-tuning, leading to substantial computational and memory overhead. To overcome these challenges, we propose an efficient framework that formulates personalization as prompt tuning, incorporating an auxiliary image reconstruction task with a self-supervised loss to guide prompt adaptation with unlabeled test-time examples. To ensure self-supervised loss effectively minimizes text recognition error, we leverage meta-learning to learn the optimal initialization of the prompts. As a result, our method allows the model to efficiently capture unique writing styles by updating less than 1% of its parameters and eliminating the need for time-intensive annotation processes. We validate our approach on the RIMES and IAM Handwriting Database benchmarks, where it consistently outperforms previous state-of-the-art methods while using 20x fewer parameters. We believe this represents a significant advancement in personalized handwritten text recognition, paving the way for more reliable and practical deployment in resource-constrained scenarios.

Paper Structure

This paper contains 12 sections, 3 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of the challenge of multi-writer handwriting recognition in the IAM dataset. The left panel shows examples of handwritten text from different writers, highlighting variations in letter shapes, spacing, and stroke patterns. The right panel presents the model’s predictions for each example, demonstrating the difficulty in accurately recognizing diverse handwriting styles.
  • Figure 2: Illustration of performance comparison between our method and other state-of-the-art methods. We report the word error rate (WER) and the number of tunable parameters (both metrics are better when lower). Our method achieves the lowest WER on the IAM dataset, requiring only a negligible number of tunable parameters for personalization.
  • Figure 3: Illustration of our approach during training. The handwritten texts from a specific writer are divided into an unlabeled support set and a labeled query set. The images in the support set are masked, padded with meta prompt vectors, and passed through a shared image encoder, followed by reconstruction using a Masked Autoencoder (MAE)'s decoder. The writer-specific prompt vectors $P_j$ are derived in the inner loop using a self-supervised loss $\mathcal{L}_{\text{ada}}$, which optimizes the meta prompt vectors through a single gradient step. These writer-specific prompt vectors are then padded with the document images from the query set and used as input to the HTR model to predict a sequence of tokens representing the writing content. In the outer loop, the learned meta prompts $P$ are updated based on a supervised loss $\mathcal{L}_{\text{pred}}$ computed by comparing the predicted text tokens against the ground truth.
  • Figure 4: Illustration of the capability of Masked AutoEncoders to effectively tackle the HTR problem across three diverse handwriting styles from test data. The first column represents the original input images. The second column displays the images masked by 75%, which serve as the input for the reconstruction process. The third column presents the outputs, demonstrating how the MAE recovers the complete handwritten content from the masked images, effectively adapting to each unique writing style.
  • Figure 5: illustration of the Character Error Rate (CER) for individual writers (Person IDs 1 to 20) on the IAM dataset. We compare our method with the baseline, showing that our model improves accuracy across all writers.
  • ...and 1 more figures