Table of Contents
Fetching ...

Text Image Inpainting via Global Structure-Guided Diffusion Models

Shipeng Zhu, Pengfei Fang, Chenjie Zhu, Zuoyan Zhao, Qiang Xu, Hui Xue

TL;DR

This work tackles text image inpainting under real-world corrosion by introducing two benchmark datasets, TII-ST and TII-HT, that combine synthetic and real data across three corrosion forms. It presents Global Structure-guided Diffusion Model (GSDM), a two-module framework where a Structure Prediction Module supplies a complete global structure and a Reconstruction Module performs diffusion-based restoration conditioned on the corrupted image and predicted structure. The approach yields significant gains in downstream recognition accuracy and image quality over state-of-the-art baselines on both scene and handwritten text, and ablations reveal the importance of predicting target content, integrating semantic/style guidance, and efficient non-Markov sampling. The benchmark and method offer a practical path to more reliable text image understanding and processing in corrupted real-world scenarios.

Abstract

Real-world text can be damaged by corrosion issues caused by environmental or human factors, which hinder the preservation of the complete styles of texts, e.g., texture and structure. These corrosion issues, such as graffiti signs and incomplete signatures, bring difficulties in understanding the texts, thereby posing significant challenges to downstream applications, e.g., scene text recognition and signature identification. Notably, current inpainting techniques often fail to adequately address this problem and have difficulties restoring accurate text images along with reasonable and consistent styles. Formulating this as an open problem of text image inpainting, this paper aims to build a benchmark to facilitate its study. In doing so, we establish two specific text inpainting datasets which contain scene text images and handwritten text images, respectively. Each of them includes images revamped by real-life and synthetic datasets, featuring pairs of original images, corrupted images, and other assistant information. On top of the datasets, we further develop a novel neural framework, Global Structure-guided Diffusion Model (GSDM), as a potential solution. Leveraging the global structure of the text as a prior, the proposed GSDM develops an efficient diffusion model to recover clean texts. The efficacy of our approach is demonstrated by thorough empirical study, including a substantial boost in both recognition accuracy and image quality. These findings not only highlight the effectiveness of our method but also underscore its potential to enhance the broader field of text image understanding and processing. Code and datasets are available at: https://github.com/blackprotoss/GSDM.

Text Image Inpainting via Global Structure-Guided Diffusion Models

TL;DR

This work tackles text image inpainting under real-world corrosion by introducing two benchmark datasets, TII-ST and TII-HT, that combine synthetic and real data across three corrosion forms. It presents Global Structure-guided Diffusion Model (GSDM), a two-module framework where a Structure Prediction Module supplies a complete global structure and a Reconstruction Module performs diffusion-based restoration conditioned on the corrupted image and predicted structure. The approach yields significant gains in downstream recognition accuracy and image quality over state-of-the-art baselines on both scene and handwritten text, and ablations reveal the importance of predicting target content, integrating semantic/style guidance, and efficient non-Markov sampling. The benchmark and method offer a practical path to more reliable text image understanding and processing in corrupted real-world scenarios.

Abstract

Real-world text can be damaged by corrosion issues caused by environmental or human factors, which hinder the preservation of the complete styles of texts, e.g., texture and structure. These corrosion issues, such as graffiti signs and incomplete signatures, bring difficulties in understanding the texts, thereby posing significant challenges to downstream applications, e.g., scene text recognition and signature identification. Notably, current inpainting techniques often fail to adequately address this problem and have difficulties restoring accurate text images along with reasonable and consistent styles. Formulating this as an open problem of text image inpainting, this paper aims to build a benchmark to facilitate its study. In doing so, we establish two specific text inpainting datasets which contain scene text images and handwritten text images, respectively. Each of them includes images revamped by real-life and synthetic datasets, featuring pairs of original images, corrupted images, and other assistant information. On top of the datasets, we further develop a novel neural framework, Global Structure-guided Diffusion Model (GSDM), as a potential solution. Leveraging the global structure of the text as a prior, the proposed GSDM develops an efficient diffusion model to recover clean texts. The efficacy of our approach is demonstrated by thorough empirical study, including a substantial boost in both recognition accuracy and image quality. These findings not only highlight the effectiveness of our method but also underscore its potential to enhance the broader field of text image understanding and processing. Code and datasets are available at: https://github.com/blackprotoss/GSDM.
Paper Structure (39 sections, 12 equations, 11 figures, 14 tables)

This paper contains 39 sections, 12 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: The illustration of corrosion forms in real-life scenarios and the challenges of text image inpainting.
  • Figure 2: The illustration of inpainting images with recognition results based on different methods. The (i) to (vi) denote Corrupted images, DDIM, CoPaint, TransCNN-HAE. GSDM, and GT. Red characters indicate errors.
  • Figure 3: Some training examples in the two datasets. The images of the first three rows are from TII-ST and the images of the last three rows are from TII-HT.
  • Figure 4: The overall architecture of our proposed Global Structure-guided Diffusion Model (GSDM). It consists of two main modules: Structure Prediction Module (SPM) and Reconstruction Module (RM).
  • Figure 5: The inpainting images with recognition results on TII-ST (ASTER) and TII-HT (TrOCR-L). Red characters indicate errors. The (i) to (vii) denote Corrupted Images, TSINIT/Wang et al., DDIM, CoPaint, TransCNN, GSDM, and GT, respectively.
  • ...and 6 more figures