MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning
Wenhao Gu, Li Gu, Ching Yee Suen, Yang Wang
TL;DR
The paper tackles robust handwritten text recognition across diverse writer styles by introducing MetaWriter, a parameter-efficient writer personalization framework. It formulates personalization as prompt tuning and leverages a self-supervised Masked Autoencoder (MAE) loss to guide adaptation using unlabeled test-time examples, with a meta-learned initialization of prompts to align self-supervised and text-prediction objectives. The approach updates only about $1\%$ of parameters and achieves state-of-the-art results on IAM and RIMES while using ~20x fewer trainable parameters than comparable methods, demonstrating strong practical potential for on-device deployment. This work meaningfully advances personalized HTR by combining meta-learning, visual prompts, and self-supervision to enable fast, data-efficient adaptation in realistic, resource-constrained settings.
Abstract
Recent advancements in handwritten text recognition (HTR) have enabled the effective conversion of handwritten text to digital formats. However, achieving robust recognition across diverse writing styles remains challenging. Traditional HTR methods lack writer-specific personalization at test time due to limitations in model architecture and training strategies. Existing attempts to bridge this gap, through gradient-based meta-learning, still require labeled examples and suffer from parameter-inefficient fine-tuning, leading to substantial computational and memory overhead. To overcome these challenges, we propose an efficient framework that formulates personalization as prompt tuning, incorporating an auxiliary image reconstruction task with a self-supervised loss to guide prompt adaptation with unlabeled test-time examples. To ensure self-supervised loss effectively minimizes text recognition error, we leverage meta-learning to learn the optimal initialization of the prompts. As a result, our method allows the model to efficiently capture unique writing styles by updating less than 1% of its parameters and eliminating the need for time-intensive annotation processes. We validate our approach on the RIMES and IAM Handwriting Database benchmarks, where it consistently outperforms previous state-of-the-art methods while using 20x fewer parameters. We believe this represents a significant advancement in personalized handwritten text recognition, paving the way for more reliable and practical deployment in resource-constrained scenarios.
