Table of Contents
Fetching ...

Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes

Xu Han, Junyu Gao, Chuang Yang, Yuan Yuan, Qi Wang

TL;DR

This work proposes an efficient multi-scene text detector that contains an effective text representation similar mask (SM) and a feature correction module (FCM) that saves 50% of the time, accurately and efficiently reconstructing text contours.

Abstract

Texts on the intelligent transportation scene include mass information. Fully harnessing this information is one of the critical drivers for advancing intelligent transportation. Unlike the general scene, detecting text in transportation has extra demand, such as a fast inference speed, except for high accuracy. Most existing real-time text detection methods are based on the shrink mask, which loses some geometry semantic information and needs complex post-processing. In addition, the previous method usually focuses on correct output, which ignores feature correction and lacks guidance during the intermediate process. To this end, we propose an efficient multi-scene text detector that contains an effective text representation similar mask (SM) and a feature correction module (FCM). Unlike previous methods, the former aims to preserve the geometric information of the instances as much as possible. Its post-progressing saves 50$\%$ of the time, accurately and efficiently reconstructing text contours. The latter encourages false positive features to move away from the positive feature center, optimizing the predictions from the feature level. Some ablation studies demonstrate the efficiency of the SM and the effectiveness of the FCM. Moreover, the deficiency of existing traffic datasets (such as the low-quality annotation or closed source data unavailability) motivated us to collect and annotate a traffic text dataset, which introduces motion blur. In addition, to validate the scene robustness of the SM-Net, we conduct experiments on traffic, industrial, and natural scene datasets. Extensive experiments verify it achieves (SOTA) performance on several benchmarks. The code and dataset are available at: \url{https://github.com/fengmulin/SMNet}.

Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes

TL;DR

This work proposes an efficient multi-scene text detector that contains an effective text representation similar mask (SM) and a feature correction module (FCM) that saves 50% of the time, accurately and efficiently reconstructing text contours.

Abstract

Texts on the intelligent transportation scene include mass information. Fully harnessing this information is one of the critical drivers for advancing intelligent transportation. Unlike the general scene, detecting text in transportation has extra demand, such as a fast inference speed, except for high accuracy. Most existing real-time text detection methods are based on the shrink mask, which loses some geometry semantic information and needs complex post-processing. In addition, the previous method usually focuses on correct output, which ignores feature correction and lacks guidance during the intermediate process. To this end, we propose an efficient multi-scene text detector that contains an effective text representation similar mask (SM) and a feature correction module (FCM). Unlike previous methods, the former aims to preserve the geometric information of the instances as much as possible. Its post-progressing saves 50 of the time, accurately and efficiently reconstructing text contours. The latter encourages false positive features to move away from the positive feature center, optimizing the predictions from the feature level. Some ablation studies demonstrate the efficiency of the SM and the effectiveness of the FCM. Moreover, the deficiency of existing traffic datasets (such as the low-quality annotation or closed source data unavailability) motivated us to collect and annotate a traffic text dataset, which introduces motion blur. In addition, to validate the scene robustness of the SM-Net, we conduct experiments on traffic, industrial, and natural scene datasets. Extensive experiments verify it achieves (SOTA) performance on several benchmarks. The code and dataset are available at: \url{https://github.com/fengmulin/SMNet}.

Paper Structure

This paper contains 39 sections, 23 equations, 10 figures, 14 tables.

Figures (10)

  • Figure 1: Texts in different scenes. Texts in the natural scene enjoy a complex background. In the industrial scene, low visual contrast and corroded surfaces make the difficult to detect text accurately. Motion blur and changes in weather and lighting are the main challenges for traffic scenes.
  • Figure 2: The data distribution visualization of the proposed MBTST-1528 datasets. (a) Instance area distributions. (b) Character number distributions.
  • Figure 3: The visualization of original image and motion blur image.
  • Figure 4: The visualization of some error annotation on the TGPD dataset. The error and missing annotations are labeled in red and orange, respectively.
  • Figure 5: The label generation visualization of shrink mask and similar mask. The blue, green, and orange polygons represent text contour, shrink mask, and similar mask, respectively.
  • ...and 5 more figures