Text Region Multiple Information Perception Network for Scene Text Detection

Jinzhi Zheng; Libo Zhang; Yanjun Wu; Chen Zhao

Text Region Multiple Information Perception Network for Scene Text Detection

Jinzhi Zheng, Libo Zhang, Yanjun Wu, Chen Zhao

TL;DR

This work addresses the limitations of segmentation-based scene text detectors that focus on text center regions by introducing a Region Multiple Information Perception Module (RMIPM). RMIPN integrates a U-Net-like backbone with RMIPM, which allows simultaneous perception of center, foreground, distance-to-edge, and edge-direction cues to produce enhanced text-region features for detection. The model is trained with a multi-task loss and uses four target maps, enabling richer supervision. Experiments on TotalText and MSRA-TD500 show competitive state-of-the-art performance, with notable gains on MSRA-TD500 and clear ablation-based evidence that RMIPM improves recall, precision, and F1, supporting the effectiveness of multi-region perception for scene text detection.

Abstract

Segmentation-based scene text detection algorithms can handle arbitrary shape scene texts and have strong robustness and adaptability, so it has attracted wide attention. Existing segmentation-based scene text detection algorithms usually only segment the pixels in the center region of the text, while ignoring other information of the text region, such as edge information, distance information, etc., thus limiting the detection accuracy of the algorithm for scene text. This paper proposes a plug-and-play module called the Region Multiple Information Perception Module (RMIPM) to enhance the detection performance of segmentation-based algorithms. Specifically, we design an improved module that can perceive various types of information about scene text regions, such as text foreground classification maps, distance maps, direction maps, etc. Experiments on MSRA-TD500 and TotalText datasets show that our method achieves comparable performance with current state-of-the-art algorithms.

Text Region Multiple Information Perception Network for Scene Text Detection

TL;DR

Abstract

Paper Structure (11 sections, 6 equations, 4 figures, 3 tables)

This paper contains 11 sections, 6 equations, 4 figures, 3 tables.

Introduction
PROPOSED METHOD
Overview
Detailed Architecture
Training Objective Loss
Experiments
Datasets
Implementation Details
Comparison with SOTA approaches
Ablation Study
CONCLUSION

Figures (4)

Figure 1: The overall architecture of our RMIPN, which mainly consists of a Backbone, a Region Multiple Information Perception Module (RMIPM), and a Detection Head.
Figure 2: The overall architecture of Text Region Information Perception Module (IPM). Red arrows indicate that they are only present during the training phase
Figure 3: Examples of sample ground-truth labels for the total-text dataset. 'X Direction Map' represents the visualization in the x-direction and 'Y Direction Map' represents the visualization in the y-axis direction.
Figure 4: Example of RMIPN detection results on datasets MSRA-TD500, ICDAR2015, and TotalText.(a) MSRA-TD500 dataset, (b) TotalText data set

Text Region Multiple Information Perception Network for Scene Text Detection

TL;DR

Abstract

Text Region Multiple Information Perception Network for Scene Text Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (4)