Table of Contents
Fetching ...

Efficient scene text image super-resolution with semantic guidance

LeoWu TomyEnrique, Xiangcheng Du, Kangliang Liu, Han Yuan, Zhao Zhou, Cheng Jin

TL;DR

The paper tackles the need for efficient scene text image super-resolution suitable for resource-limited deployments. It introduces SGENet, a two-branch architecture combining a semantic guidance branch (driven by a lightweight pre-trained recognizer) with a visual-semantic alignment module to generate high-quality priors that guide SR. Through a joint training loss that blends reconstruction and recognition guidance, SGENet achieves competitive accuracy while drastically reducing parameters and FLOPs compared to heavier methods, as demonstrated on the TextZoom dataset. This work enables practical edge deployment of STISR solutions without substantial sacrifices in recognition performance.

Abstract

Scene text image super-resolution has significantly improved the accuracy of scene text recognition. However, many existing methods emphasize performance over efficiency and ignore the practical need for lightweight solutions in deployment scenarios. Faced with the issues, our work proposes an efficient framework called SGENet to facilitate deployment on resource-limited platforms. SGENet contains two branches: super-resolution branch and semantic guidance branch. We apply a lightweight pre-trained recognizer as a semantic extractor to enhance the understanding of text information. Meanwhile, we design the visual-semantic alignment module to achieve bidirectional alignment between image features and semantics, resulting in the generation of highquality prior guidance. We conduct extensive experiments on benchmark dataset, and the proposed SGENet achieves excellent performance with fewer computational costs. Code is available at https://github.com/SijieLiu518/SGENet

Efficient scene text image super-resolution with semantic guidance

TL;DR

The paper tackles the need for efficient scene text image super-resolution suitable for resource-limited deployments. It introduces SGENet, a two-branch architecture combining a semantic guidance branch (driven by a lightweight pre-trained recognizer) with a visual-semantic alignment module to generate high-quality priors that guide SR. Through a joint training loss that blends reconstruction and recognition guidance, SGENet achieves competitive accuracy while drastically reducing parameters and FLOPs compared to heavier methods, as demonstrated on the TextZoom dataset. This work enables practical edge deployment of STISR solutions without substantial sacrifices in recognition performance.

Abstract

Scene text image super-resolution has significantly improved the accuracy of scene text recognition. However, many existing methods emphasize performance over efficiency and ignore the practical need for lightweight solutions in deployment scenarios. Faced with the issues, our work proposes an efficient framework called SGENet to facilitate deployment on resource-limited platforms. SGENet contains two branches: super-resolution branch and semantic guidance branch. We apply a lightweight pre-trained recognizer as a semantic extractor to enhance the understanding of text information. Meanwhile, we design the visual-semantic alignment module to achieve bidirectional alignment between image features and semantics, resulting in the generation of highquality prior guidance. We conduct extensive experiments on benchmark dataset, and the proposed SGENet achieves excellent performance with fewer computational costs. Code is available at https://github.com/SijieLiu518/SGENet
Paper Structure (12 sections, 6 equations, 3 figures, 3 tables)

This paper contains 12 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Total number of parameters $vs.$ scene text recognition accuracy in different STISR networks. Our model achieves satisfactory results considering that it has fewer parameters.
  • Figure 2: The overall architecture of SGENet. It consists of two branches, the semantic guidance branch and the super-resolution branch. The output of the semantic guidance branch is used to guide super-resolution reconstruction.
  • Figure 3: Visualization results of SGENet on TextZoom dataset.