Watermark Text Pattern Spotting in Document Images
Mateusz Krubiński, Stefan Matcovici, Diana Grigore, Daniel Voinea, Alin-Ionut Popa
TL;DR
Watermark text spotting in document images is challenging due to occlusion, varying fonts, and diverse layouts. The authors introduce K-Watermark, a large synthetic benchmark of 65,447 samples generated with Wrender, and Wextract, an end-to-end detector-recognizer that uses a variance minimization loss and a hierarchical local-global self-attention encoder-decoder. On K-Watermark, Wextract achieves state-of-the-art performance on both detection and recognition, outperforming strong baselines and demonstrating robustness to faded text and dense overlays. These contributions provide a practical dataset and methodology to advance watermark understanding within OCR-style pipelines and broader visual document understanding tasks.
Abstract
Watermark text spotting in document images can offer access to an often unexplored source of information, providing crucial evidence about a record's scope, audience and sometimes even authenticity. Stemming from the problem of text spotting, detecting and understanding watermarks in documents inherits the same hardships - in the wild, writing can come in various fonts, sizes and forms, making generic recognition a very difficult problem. To address the lack of resources in this field and propel further research, we propose a novel benchmark (K-Watermark) containing 65,447 data samples generated using Wrender, a watermark text patterns rendering procedure. A validity study using humans raters yields an authenticity score of 0.51 against pre-generated watermarked documents. To prove the usefulness of the dataset and rendering technique, we developed an end-to-end solution (Wextract) for detecting the bounding box instances of watermark text, while predicting the depicted text. To deal with this specific task, we introduce a variance minimization loss and a hierarchical self-attention mechanism. To the best of our knowledge, we are the first to propose an evaluation benchmark and a complete solution for retrieving watermarks from documents surpassing baselines by 5 AP points in detection and 4 points in character accuracy.
