Table of Contents
Fetching ...

Edge Approximation Text Detector

Chuang Yang, Xu Han, Tao Han, Han Han, Bingxuan Zhao, Qi Wang

TL;DR

The paper tackles irregular scene text detection by introducing EdgeText, a method that represents text contours as two smooth edge curves processed by parameterized polynomials $f(\Theta;x)$ and $g(\Theta;x)$ with truncation points. It proposes an end-to-end framework combining a Bilateral Enhanced Perception (BEP) module for edge-feature recognition and a Proportional Integral loss (PI-loss) to learn curve parameters robustly, enabling parallel curve-box reconstruction and simplified post-processing. The approach demonstrates that a curve-based edge-approximation representation can achieve high accuracy across regular and irregular text datasets while improving efficiency compared with box-to-polygon or piecewise-fitting methods. The results indicate EdgeText provides superior or competitive performance with improved contour compactness and faster inference, offering practical benefits for integrated text spotting pipelines.

Abstract

Pursuing efficient text shape representations helps scene text detection models focus on compact foreground regions and optimize the contour reconstruction steps to simplify the whole detection pipeline. Current approaches either represent irregular shapes via box-to-polygon strategy or decomposing a contour into pieces for fitting gradually, the deficiency of coarse contours or complex pipelines always exists in these models. Considering the above issues, we introduce EdgeText to fit text contours compactly while alleviating excessive contour rebuilding processes. Concretely, it is observed that the two long edges of texts can be regarded as smooth curves. It allows us to build contours via continuous and smooth edges that cover text regions tightly instead of fitting piecewise, which helps avoid the two limitations in current models. Inspired by this observation, EdgeText formulates the text representation as the edge approximation problem via parameterized curve fitting functions. In the inference stage, our model starts with locating text centers, and then creating curve functions for approximating text edges relying on the points. Meanwhile, truncation points are determined based on the location features. In the end, extracting curve segments from curve functions by using the pixel coordinate information brought by truncation points to reconstruct text contours. Furthermore, considering the deep dependency of EdgeText on text edges, a bilateral enhanced perception (BEP) module is designed. It encourages our model to pay attention to the recognition of edge features. Additionally, to accelerate the learning of the curve function parameters, we introduce a proportional integral loss (PI-loss) to force the proposed model to focus on the curve distribution and avoid being disturbed by text scales.

Edge Approximation Text Detector

TL;DR

The paper tackles irregular scene text detection by introducing EdgeText, a method that represents text contours as two smooth edge curves processed by parameterized polynomials and with truncation points. It proposes an end-to-end framework combining a Bilateral Enhanced Perception (BEP) module for edge-feature recognition and a Proportional Integral loss (PI-loss) to learn curve parameters robustly, enabling parallel curve-box reconstruction and simplified post-processing. The approach demonstrates that a curve-based edge-approximation representation can achieve high accuracy across regular and irregular text datasets while improving efficiency compared with box-to-polygon or piecewise-fitting methods. The results indicate EdgeText provides superior or competitive performance with improved contour compactness and faster inference, offering practical benefits for integrated text spotting pipelines.

Abstract

Pursuing efficient text shape representations helps scene text detection models focus on compact foreground regions and optimize the contour reconstruction steps to simplify the whole detection pipeline. Current approaches either represent irregular shapes via box-to-polygon strategy or decomposing a contour into pieces for fitting gradually, the deficiency of coarse contours or complex pipelines always exists in these models. Considering the above issues, we introduce EdgeText to fit text contours compactly while alleviating excessive contour rebuilding processes. Concretely, it is observed that the two long edges of texts can be regarded as smooth curves. It allows us to build contours via continuous and smooth edges that cover text regions tightly instead of fitting piecewise, which helps avoid the two limitations in current models. Inspired by this observation, EdgeText formulates the text representation as the edge approximation problem via parameterized curve fitting functions. In the inference stage, our model starts with locating text centers, and then creating curve functions for approximating text edges relying on the points. Meanwhile, truncation points are determined based on the location features. In the end, extracting curve segments from curve functions by using the pixel coordinate information brought by truncation points to reconstruct text contours. Furthermore, considering the deep dependency of EdgeText on text edges, a bilateral enhanced perception (BEP) module is designed. It encourages our model to pay attention to the recognition of edge features. Additionally, to accelerate the learning of the curve function parameters, we introduce a proportional integral loss (PI-loss) to force the proposed model to focus on the curve distribution and avoid being disturbed by text scales.

Paper Structure

This paper contains 17 sections, 13 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Visualization of the differences between our edge approximation text representation method (bottom sub-figure) and the current popular representations of box-to-polygon strategy (left top sub-figure) and piecewise fitting process (right top sub-figure).
  • Figure 2: Detail visualization of the proposed edge approximation text representation method. It fits texts with the curve box that is constructed based on the approximate edge and truncation points. The same as the traditional bounding box representation, our curve box enjoys a simple reconstruction process. Especially, it can fit irregular texts accurately instead of covering rectangular shapes only like the traditional bounding box.
  • Figure 3: Visualization of the processes of the label generation and curve box reconstruction. $f(\mathrm{\Theta}_t;x)$ and $g(\mathrm{\Theta}_b;x)$ are the same curve-fitting function except for different parameters. Polynomial is adopted as the curve fitting function in this paper.
  • Figure 4: Visualization of the contour reconstruction differences between existing methods and ours EdgeText. Compared with existing box-to-polygon strategy-based or piecewise fitting methods that have to reconstruct every single text one by one until all instance contours are generated, EdgeText rebuilds all text curve boxes in the input image in parallel, which enjoys a more intuitive contour rebuilding process with fewer procedures.
  • Figure 5: Pipeline visualization of the constructed EdgeText, which is composed of a feature extractor, BEP module, and three headers of edge, truncation, and concentric mask. These headers are responsible for the parameter prediction of edge approximating curve fitting function, the determination of truncation points, and the location of text instances.
  • ...and 7 more figures