Progressive Evolution from Single-Point to Polygon for Scene Text
Linger Deng, Mingxin Huang, Xudong Xie, Yuliang Liu, Lianwen Jin, Xiang Bai
TL;DR
This work tackles the annotation bottleneck in scene-text polygon labeling by converting single-point annotations into polygon representations. It introduces Point2Polygon, a coarse-to-fine pipeline comprising AGM, PGM, and PRM that leverages recognition information at multiple granularities and TPS-based deformation to generate accurate polygons from minimal supervision. Experiments across ICDAR2015, TotalText, and CTW1500 show the generated polygons approach GT-based training performance (about 86% in detector training) and achieve competitive accuracy when integrated with single-point spotters (up to 82.5%). The method significantly reduces labeling costs while delivering compact, downstream-friendly text representations, providing a practical baseline for future single-point-to-polygon evolution research.
Abstract
The advancement of text shape representations towards compactness has enhanced text detection and spotting performance, but at a high annotation cost. Current models use single-point annotations to reduce costs, yet they lack sufficient localization information for downstream applications. To overcome this limitation, we introduce Point2Polygon, which can efficiently transform single-points into compact polygons. Our method uses a coarse-to-fine process, starting with creating and selecting anchor points based on recognition confidence, then vertically and horizontally refining the polygon using recognition information to optimize its shape. We demonstrate the accuracy of the generated polygons through extensive experiments: 1) By creating polygons from ground truth points, we achieved an accuracy of 82.0% on ICDAR 2015; 2) In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons. This integration led to an impressive 82.5% accuracy for the generated polygons. It is worth mentioning that our method relies solely on synthetic recognition information, eliminating the need for any manual annotation beyond single points.
