Table of Contents
Fetching ...

Seal2Real: Prompt Prior Learning on Diffusion Model for Unsupervised Document Seal Data Generation and Realisation

Mingfu Yan, Jiancheng Huang, Shifeng Chen

TL;DR

Seal2Real tackles data scarcity in document-seal analysis by learning prompt priors on a pretrained diffusion framework to generate labeled seal data in an unsupervised setting. It introduces a two-stage process: first, prompt-prior learning to model distributions of real and forged seals via prompts $T_r$ and $T_f$ using the loss $\\mathcal{L}_{prompts}$, and second, a forger network trained with $\\mathcal{L}_{forger} = \\mathcal{L}_{prior} + w \\mathcal{L}_{content}$ to produce convincing forged seals; optionally refined through adversarial training. The authors present Seal-DB, a 20,000-image dataset with paired labels for segmentation and text recognition under seals, and demonstrate substantial improvements on downstream tasks (segmentation, authenticity classification, OCR) when trained with data generated by Seal2Real compared to traditional synthesis and competing diffusion methods. This work offers a scalable, unsupervised path to high-fidelity synthetic data for document-seal processing and points to extensions to other document elements such as signatures or stamps.

Abstract

Seal-related tasks in document processing-such as seal segmentation, authenticity verification, seal removal, and text recognition under seals-hold substantial commercial importance. However, progress in these areas has been hindered by the scarcity of labeled document seal datasets, which are essential for supervised learning. To address this limitation, we propose Seal2Real, a novel generative framework designed to synthesize large-scale labeled document seal data. As part of this work, we also present Seal-DB, a comprehensive dataset containing 20,000 labeled images to support seal-related research. Seal2Real introduces a prompt prior learning architecture built upon a pre-trained Stable Diffusion model, effectively transferring its generative capability to the unsupervised domain of seal image synthesis. By producing highly realistic synthetic seal images, Seal2Real significantly enhances the performance of downstream seal-related tasks on real-world data. Experimental evaluations on the Seal-DB dataset demonstrate the effectiveness and practical value of the proposed framework.

Seal2Real: Prompt Prior Learning on Diffusion Model for Unsupervised Document Seal Data Generation and Realisation

TL;DR

Seal2Real tackles data scarcity in document-seal analysis by learning prompt priors on a pretrained diffusion framework to generate labeled seal data in an unsupervised setting. It introduces a two-stage process: first, prompt-prior learning to model distributions of real and forged seals via prompts and using the loss , and second, a forger network trained with to produce convincing forged seals; optionally refined through adversarial training. The authors present Seal-DB, a 20,000-image dataset with paired labels for segmentation and text recognition under seals, and demonstrate substantial improvements on downstream tasks (segmentation, authenticity classification, OCR) when trained with data generated by Seal2Real compared to traditional synthesis and competing diffusion methods. This work offers a scalable, unsupervised path to high-fidelity synthetic data for document-seal processing and points to extensions to other document elements such as signatures or stamps.

Abstract

Seal-related tasks in document processing-such as seal segmentation, authenticity verification, seal removal, and text recognition under seals-hold substantial commercial importance. However, progress in these areas has been hindered by the scarcity of labeled document seal datasets, which are essential for supervised learning. To address this limitation, we propose Seal2Real, a novel generative framework designed to synthesize large-scale labeled document seal data. As part of this work, we also present Seal-DB, a comprehensive dataset containing 20,000 labeled images to support seal-related research. Seal2Real introduces a prompt prior learning architecture built upon a pre-trained Stable Diffusion model, effectively transferring its generative capability to the unsupervised domain of seal image synthesis. By producing highly realistic synthetic seal images, Seal2Real significantly enhances the performance of downstream seal-related tasks on real-world data. Experimental evaluations on the Seal-DB dataset demonstrate the effectiveness and practical value of the proposed framework.
Paper Structure (11 sections, 6 equations, 9 figures, 2 tables)

This paper contains 11 sections, 6 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Illustration of our task. We need to generate a realistic seal on a document image for making a dataset of labeled seal images.
  • Figure 2: Overview of the pretrained Stable Diffusion Model architecture employed, illustrating the key components including the Denoising U-Net. Input images $x$ are encoded ($\mathcal{E}$) to latent space $z$. Noise is added via the diffusion process. The Denoising U-Net iteratively removes noise, conditioned on external inputs (Text/Images via Encoder $E$) using cross-attention (QKV) blocks and featuring skip connections. The final latent is decoded ($D$) back to pixel space $\tilde{x}$. This diagram provides the requested architectural information for the U-Net component.
  • Figure 3: The first stage of our learning framework, termed the Prompt Learning Stage, focuses on learning distributions of real and fake seal images. In this stage, a diffusion loss is used to optimize both real and forgery prompts while fine-tuning the U-Net component of the Stable Diffusion (SD) model. The fine-tuned SD model can generate realistic seal images using real text prompts and synthetic seal images using forged text prompts. However, due to inherent randomness in diffusion-based generation, the output content is non-deterministic. In contrast, our proposed forger network is capable of generating forgeries with explicitly specified content. The outputs of the fine-tuned SD model are further used to train the forger in the second stage through a weakly supervised approach.
  • Figure 4: Illustration of the second stage learning framework. (a) Forger network learning stage. After the prompt learning stage, we use the prompt priors learned in the first stage to train our seal forger network. (b) Illustration of the modeled distribution shift in the second stage.
  • Figure 5: Illustration of our Seal-DB, consisting of a forgery part and a real part. The forgery seal images, also referred to as synthetic seal images, are generated in large quantities with high realism. These images are accompanied by paired labels, including segmentation masks (for seal segmentation), seal text (for text recognition under seals), and non-seal images (for seal removal). In contrast, the real part consists of unpaired images without annotations.
  • ...and 4 more figures