Table of Contents
Fetching ...

Watermark Text Pattern Spotting in Document Images

Mateusz Krubiński, Stefan Matcovici, Diana Grigore, Daniel Voinea, Alin-Ionut Popa

TL;DR

Watermark text spotting in document images is challenging due to occlusion, varying fonts, and diverse layouts. The authors introduce K-Watermark, a large synthetic benchmark of 65,447 samples generated with Wrender, and Wextract, an end-to-end detector-recognizer that uses a variance minimization loss and a hierarchical local-global self-attention encoder-decoder. On K-Watermark, Wextract achieves state-of-the-art performance on both detection and recognition, outperforming strong baselines and demonstrating robustness to faded text and dense overlays. These contributions provide a practical dataset and methodology to advance watermark understanding within OCR-style pipelines and broader visual document understanding tasks.

Abstract

Watermark text spotting in document images can offer access to an often unexplored source of information, providing crucial evidence about a record's scope, audience and sometimes even authenticity. Stemming from the problem of text spotting, detecting and understanding watermarks in documents inherits the same hardships - in the wild, writing can come in various fonts, sizes and forms, making generic recognition a very difficult problem. To address the lack of resources in this field and propel further research, we propose a novel benchmark (K-Watermark) containing 65,447 data samples generated using Wrender, a watermark text patterns rendering procedure. A validity study using humans raters yields an authenticity score of 0.51 against pre-generated watermarked documents. To prove the usefulness of the dataset and rendering technique, we developed an end-to-end solution (Wextract) for detecting the bounding box instances of watermark text, while predicting the depicted text. To deal with this specific task, we introduce a variance minimization loss and a hierarchical self-attention mechanism. To the best of our knowledge, we are the first to propose an evaluation benchmark and a complete solution for retrieving watermarks from documents surpassing baselines by 5 AP points in detection and 4 points in character accuracy.

Watermark Text Pattern Spotting in Document Images

TL;DR

Watermark text spotting in document images is challenging due to occlusion, varying fonts, and diverse layouts. The authors introduce K-Watermark, a large synthetic benchmark of 65,447 samples generated with Wrender, and Wextract, an end-to-end detector-recognizer that uses a variance minimization loss and a hierarchical local-global self-attention encoder-decoder. On K-Watermark, Wextract achieves state-of-the-art performance on both detection and recognition, outperforming strong baselines and demonstrating robustness to faded text and dense overlays. These contributions provide a practical dataset and methodology to advance watermark understanding within OCR-style pipelines and broader visual document understanding tasks.

Abstract

Watermark text spotting in document images can offer access to an often unexplored source of information, providing crucial evidence about a record's scope, audience and sometimes even authenticity. Stemming from the problem of text spotting, detecting and understanding watermarks in documents inherits the same hardships - in the wild, writing can come in various fonts, sizes and forms, making generic recognition a very difficult problem. To address the lack of resources in this field and propel further research, we propose a novel benchmark (K-Watermark) containing 65,447 data samples generated using Wrender, a watermark text patterns rendering procedure. A validity study using humans raters yields an authenticity score of 0.51 against pre-generated watermarked documents. To prove the usefulness of the dataset and rendering technique, we developed an end-to-end solution (Wextract) for detecting the bounding box instances of watermark text, while predicting the depicted text. To deal with this specific task, we introduce a variance minimization loss and a hierarchical self-attention mechanism. To the best of our knowledge, we are the first to propose an evaluation benchmark and a complete solution for retrieving watermarks from documents surpassing baselines by 5 AP points in detection and 4 points in character accuracy.
Paper Structure (11 sections, 6 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 11 sections, 6 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Sample watermarked document image patches. The task of watermark text recovering is challenging (a) from a visual perspective due to resemblance with other document elements (rightmost image) and (b) from a textual / language perspective due to high fadedness causing text misinterpretation from overlapping document text (left upper image).
  • Figure 2: Detailed overview of the proposed $\mathcal{W}$extract method. Given an input image $\mathbf{I}$, we obtain a set of watermark region proposals via the $\mathrm{\Psi}_\texttt{CLS}$ and $\mathrm{\Psi}_\texttt{BBX}$ heads. At the same time, we construct a global embedding representation by performing self-attention on top of the sequence of watermark region proposals generated via $\mathrm{\Psi}_\texttt{ROI}$ and a local embedding representation built using self-attention at class-agnostic proposal feature level via $\mathrm{\Psi}_\texttt{RPN}$. Lastly, a watermark character-level prediction via $\mathrm{\Psi}_\texttt{TXT}$ is applied on top of the decoded joint global and local information.
  • Figure 3: Wordcloud visualization encoding a display of the frequency of the words used throughout our training dataset. These are some of the most frequent english words commonly used in newspapers and mass media content.
  • Figure 4: Generated individual watermark text patterns. Using our $\mathcal{W}$render insertion procedure we have full control over the watermark pattern insertion process similar to off-the-shelf professional document editing tools. Bellow each image patch, we specify the transparency (i.e. on a scale from $0$ to $1$), font (i.e. randomly sampled from google) and angle (i.e. from $-90 \degree$ to $90 \degree$ degrees) used.
  • Figure 5: Watermark Text Detection Failure Cases of $\mathcal{W}$extract on K-Watermark. Detections are highlighted with green bounding boxes and recognized text is written as yellow highlighted text at the top of each image. Failure situations usually occur due to overlap between non-uniform or dark-coloured visual elements, which impacts text recognition (e.g., leftmost image, significance vs. significane). These scenarios are complex, even for humans.
  • ...and 3 more figures