EK-Net:Real-time Scene Text Detection with Expand Kernel Distance
Boyuan Zhu, Fagui Liu, Xi Chen, Quan Tang
TL;DR
The paper tackles the shrinking bias of kernel-based scene text detectors by introducing Expand Kernel Distance (EK-Net) and a three-stage regression framework to recover complete contours for arbitrary-shaped text. It predicts kernel, threshold, and expand-distance maps, trained under a weighted loss $L = \alpha L_k + \beta L_t + \gamma L_e$ with carefully designed components (BCE, Dice, L1, smooth L1) and a binary supervision map. EK-Net uses a lightweight ResNet-18 backbone with FPN to achieve a favorable accuracy-speed trade-off, achieving F-measures around $85.7$ on ICDAR 2015 and CTW1500 at multi-ten FPS. The results demonstrate real-time performance with competitive accuracy on challenging benchmarks and suggest extension to end-to-end text spotting in future work.
Abstract
Recently, scene text detection has received significant attention due to its wide application. However, accurate detection in complex scenes of multiple scales, orientations, and curvature remains a challenge. Numerous detection methods adopt the Vatti clipping (VC) algorithm for multiple-instance training to address the issue of arbitrary-shaped text. Yet we identify several bias results from these approaches called the "shrinked kernel". Specifically, it refers to a decrease in accuracy resulting from an output that overly favors the text kernel. In this paper, we propose a new approach named Expand Kernel Network (EK-Net) with expand kernel distance to compensate for the previous deficiency, which includes three-stages regression to complete instance detection. Moreover, EK-Net not only realize the precise positioning of arbitrary-shaped text, but also achieve a trade-off between performance and speed. Evaluation results demonstrate that EK-Net achieves state-of-the-art or competitive performance compared to other advanced methods, e.g., F-measure of 85.72% at 35.42 FPS on ICDAR 2015, F-measure of 85.75% at 40.13 FPS on CTW1500.
