Table of Contents
Fetching ...

Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera

Xu Han, Junyu Gao, Chuang Yang, Yuan Yuan, Qi Wang

TL;DR

An effective spotlight text detector (STD) is proposed, which consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM) that concentrates efforts on the candidate kernel, like a camera focus on the target.

Abstract

The irregular contour representation is one of the tough challenges in scene text detection. Although segmentation-based methods have achieved significant progress with the help of flexible pixel prediction, the overlap of geographically close texts hinders detecting them separately. To alleviate this problem, some shrink-based methods predict text kernels and expand them to restructure texts. However, the text kernel is an artificial object with incomplete semantic features that are prone to incorrect or missing detection. In addition, different from the general objects, the geometry features (aspect ratio, scale, and shape) of scene texts vary significantly, which makes it difficult to detect them accurately. To consider the above problems, we propose an effective spotlight text detector (STD), which consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM). The former concentrates efforts on the candidate kernel, like a camera focus on the target. It obtains candidate features through a mapping filter and calibrates them precisely to eliminate some false positive samples. The latter designs different shape schemes to explore multiple geometric features for scene texts. It helps extract various spatial relationships to improve the model's ability to recognize kernel regions. Ablation studies prove the effectiveness of the designed SCM and MIEM. Extensive experiments verify that our STD is superior to existing state-of-the-art methods on various datasets, including ICDAR2015, CTW1500, MSRA-TD500, and Total-Text.

Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera

TL;DR

An effective spotlight text detector (STD) is proposed, which consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM) that concentrates efforts on the candidate kernel, like a camera focus on the target.

Abstract

The irregular contour representation is one of the tough challenges in scene text detection. Although segmentation-based methods have achieved significant progress with the help of flexible pixel prediction, the overlap of geographically close texts hinders detecting them separately. To alleviate this problem, some shrink-based methods predict text kernels and expand them to restructure texts. However, the text kernel is an artificial object with incomplete semantic features that are prone to incorrect or missing detection. In addition, different from the general objects, the geometry features (aspect ratio, scale, and shape) of scene texts vary significantly, which makes it difficult to detect them accurately. To consider the above problems, we propose an effective spotlight text detector (STD), which consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM). The former concentrates efforts on the candidate kernel, like a camera focus on the target. It obtains candidate features through a mapping filter and calibrates them precisely to eliminate some false positive samples. The latter designs different shape schemes to explore multiple geometric features for scene texts. It helps extract various spatial relationships to improve the model's ability to recognize kernel regions. Ablation studies prove the effectiveness of the designed SCM and MIEM. Extensive experiments verify that our STD is superior to existing state-of-the-art methods on various datasets, including ICDAR2015, CTW1500, MSRA-TD500, and Total-Text.
Paper Structure (28 sections, 21 equations, 9 figures, 10 tables)

This paper contains 28 sections, 21 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Illustration of spotlight calibration module. The coarse mask is generated like the existing method DB db, PSE pse, and PAN pan. The module focuses on the feature of the candidate region and ignores others to calibrate further the prediction to generate the refined mask, which is superior to the coarse mask.
  • Figure 2: The overall structure of the proposed STD. It is composed of the backbone, multivariate information extraction module, feature pyramid network, coarse segmentation head, and spotlight calibration module. The MIEM and FPN are used to enhance the feature fusion. The SCM is used to calibrate the coarse mask to generate the refined mask. The cascading progressive feature search module (CPFSM) is a part of SCM, which is shown in Fig. \ref{['pcm']}.
  • Figure 3: The overall structure of the proposed MIEM. It is used to extract multiple geometry features of text instances.
  • Figure 4: The overall pipeline of the Cascading Progressive Feature Search Module. It divides the feature maps into four groups and adopts a cascade scheme to obtain different receptive fields.
  • Figure 5: The visualization of the proposed method and the baseline (only predict text kernels). The ground truth and prediction on the image refer to text kernels. Compared to the former, the latter removes some impurities and generates accurate predictions to achieve more reliable results.
  • ...and 4 more figures