Task-driven single-image super-resolution reconstruction of document scans

Maciej Zyrek; Michal Kawulok

Task-driven single-image super-resolution reconstruction of document scans

Maciej Zyrek, Michal Kawulok

TL;DR

Task-driven SR for documents reframes super-resolution as a preprocessing step for OCR, using a self-supervised, multi-task loss that blends image fidelity with text-detection guidance. The method leverages a frozen CTPN detector to provide task-space signals and dynamic weight averaging to balance losses, achieving improved text localization on diverse document scans. Across benchmark and real-scanned datasets, the approach yields better text-detection IoU and robust SR-quality metrics, illustrating practical benefits for real-world document OCR. The work paves the way for extending to text recognition and multi-image SR, with potential impact on efficient, higher-quality document imaging pipelines.

Abstract

Super-resolution reconstruction is aimed at generating images of high spatial resolution from low-resolution observations. State-of-the-art super-resolution techniques underpinned with deep learning allow for obtaining results of outstanding visual quality, but it is seldom verified whether they constitute a valuable source for specific computer vision applications. In this paper, we investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans. To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection. As problems limited to a specific task are heavily ill-posed, we introduce a multi-task loss function that embraces components related with text detection coupled with those guided by image similarity. The obtained results reported in this paper are encouraging and they constitute an important step towards real-world super-resolution of document images.

Task-driven single-image super-resolution reconstruction of document scans

TL;DR

Abstract

Paper Structure (6 sections, 2 equations, 2 figures, 1 table)

This paper contains 6 sections, 2 equations, 2 figures, 1 table.

Introduction
Related Work
Contribution
Proposed Approach
Experiments
Conclusions and outlook

Figures (2)

Figure 1: Outline of the proposed self-supervised task-driven training underpinned with text detection. Red arrows indicate the propagation of the loss functions, and the black arrows show the data flow.
Figure 2: Example of SR reconstruction performed using: (a) SRCNN, (b) FSRCNN and (c) SRResNet models, all trained with the L2-HR loss function, (d) fine-tuned SRResNet model using all loss functions (L2-HR, L2-LR, CTPN-deep and CTPN-out), and (e) SRResNet trained from scratch with the task-based CTPN-deep and CTPN-out loss functions. These settings are also referred to in Table \ref{['tab:scores']}. We present the examples from the Old Books dataset (two upper rows) and from our dataset with scanned documents (two bottom rows). For each example, we include the detection input (hence the SR outcome) and the result of text detection.

Task-driven single-image super-resolution reconstruction of document scans

TL;DR

Abstract

Task-driven single-image super-resolution reconstruction of document scans

Authors

TL;DR

Abstract

Table of Contents

Figures (2)