Table of Contents
Fetching ...

Progressive Evolution from Single-Point to Polygon for Scene Text

Linger Deng, Mingxin Huang, Xudong Xie, Yuliang Liu, Lianwen Jin, Xiang Bai

TL;DR

This work tackles the annotation bottleneck in scene-text polygon labeling by converting single-point annotations into polygon representations. It introduces Point2Polygon, a coarse-to-fine pipeline comprising AGM, PGM, and PRM that leverages recognition information at multiple granularities and TPS-based deformation to generate accurate polygons from minimal supervision. Experiments across ICDAR2015, TotalText, and CTW1500 show the generated polygons approach GT-based training performance (about 86% in detector training) and achieve competitive accuracy when integrated with single-point spotters (up to 82.5%). The method significantly reduces labeling costs while delivering compact, downstream-friendly text representations, providing a practical baseline for future single-point-to-polygon evolution research.

Abstract

The advancement of text shape representations towards compactness has enhanced text detection and spotting performance, but at a high annotation cost. Current models use single-point annotations to reduce costs, yet they lack sufficient localization information for downstream applications. To overcome this limitation, we introduce Point2Polygon, which can efficiently transform single-points into compact polygons. Our method uses a coarse-to-fine process, starting with creating and selecting anchor points based on recognition confidence, then vertically and horizontally refining the polygon using recognition information to optimize its shape. We demonstrate the accuracy of the generated polygons through extensive experiments: 1) By creating polygons from ground truth points, we achieved an accuracy of 82.0% on ICDAR 2015; 2) In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons. This integration led to an impressive 82.5% accuracy for the generated polygons. It is worth mentioning that our method relies solely on synthetic recognition information, eliminating the need for any manual annotation beyond single points.

Progressive Evolution from Single-Point to Polygon for Scene Text

TL;DR

This work tackles the annotation bottleneck in scene-text polygon labeling by converting single-point annotations into polygon representations. It introduces Point2Polygon, a coarse-to-fine pipeline comprising AGM, PGM, and PRM that leverages recognition information at multiple granularities and TPS-based deformation to generate accurate polygons from minimal supervision. Experiments across ICDAR2015, TotalText, and CTW1500 show the generated polygons approach GT-based training performance (about 86% in detector training) and achieve competitive accuracy when integrated with single-point spotters (up to 82.5%). The method significantly reduces labeling costs while delivering compact, downstream-friendly text representations, providing a practical baseline for future single-point-to-polygon evolution research.

Abstract

The advancement of text shape representations towards compactness has enhanced text detection and spotting performance, but at a high annotation cost. Current models use single-point annotations to reduce costs, yet they lack sufficient localization information for downstream applications. To overcome this limitation, we introduce Point2Polygon, which can efficiently transform single-points into compact polygons. Our method uses a coarse-to-fine process, starting with creating and selecting anchor points based on recognition confidence, then vertically and horizontally refining the polygon using recognition information to optimize its shape. We demonstrate the accuracy of the generated polygons through extensive experiments: 1) By creating polygons from ground truth points, we achieved an accuracy of 82.0% on ICDAR 2015; 2) In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons. This integration led to an impressive 82.5% accuracy for the generated polygons. It is worth mentioning that our method relies solely on synthetic recognition information, eliminating the need for any manual annotation beyond single points.
Paper Structure (21 sections, 5 equations, 6 figures, 5 tables)

This paper contains 21 sections, 5 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Point2Polygon maintains cost-efficient point annotation while automatically generating polygons with high accuracy.
  • Figure 2: Adapting to the development in text shape representation, we present Point2Polygon, a novel method that significantly alleviates the constraints of point supervision.
  • Figure 3: Overview of the proposed Point2Polygon model. We use the point detector as a basis for obtaining the final text polygon by supervising it from coarse to fine strategy.
  • Figure 4: Qualitative results of AGM (left), PGM (middle) and PRM (right). The upper and lower line shows results without or with using the module, respectively. Best view in screen.
  • Figure 5: Visualization results of training using the generated polygons. The green-filled polygons are the ground truths, the blue-filled polygons are the ground truths that are marked as "don't care", and the red polygons are the detection results. The second column represents the polygon generated by our method in combination with the existing single point text detector (SPTS v2). The third to sixth columns represent the effect of different detectors trained using the generated polygons. Best view in screen.
  • ...and 1 more figures