LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network
Yuchen Su, Zhineng Chen, Zhiwen Shao, Yuning Du, Zhilong Ji, Jinfeng Bai, Yong Zhou, Yu-Gang Jiang
TL;DR
LRANet introduces Low-Rank Approximation (LRA) to represent arbitrary-shaped text contours as a linear combination of eigenanchors learned from labeled contours, enabling compact and geometry-aware decoding. A dual assignment scheme combines dense supervision during training with sparse, fast inference to boost speed without sacrificing accuracy, implemented in a single-stage LRANet detector. Evaluations on CTW1500, Total-Text, and MSRA-TD500 show state-of-the-art performance with strong efficiency, validating the effectiveness of LRA for text-specific shape modeling and the practical benefit of the dual-assignment strategy. The approach offers a scalable, robust solution for real-time, arbitrary-shaped scene text detection and has potential for extension to text spotting.
Abstract
Recently, regression-based methods, which predict parameterized text shapes for text localization, have gained popularity in scene text detection. However, the existing parameterized text shape methods still have limitations in modeling arbitrary-shaped texts due to ignoring the utilization of text-specific shape information. Moreover, the time consumption of the entire pipeline has been largely overlooked, leading to a suboptimal overall inference speed. To address these issues, we first propose a novel parameterized text shape method based on low-rank approximation. Unlike other shape representation methods that employ data-irrelevant parameterization, our approach utilizes singular value decomposition and reconstructs the text shape using a few eigenvectors learned from labeled text contours. By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation. Next, we propose a dual assignment scheme for speed acceleration. It adopts a sparse assignment branch to accelerate the inference speed, and meanwhile, provides ample supervised signals for training through a dense assignment branch. Building upon these designs, we implement an accurate and efficient arbitrary-shaped text detector named LRANet. Extensive experiments are conducted on several challenging benchmarks, demonstrating the superior accuracy and efficiency of LRANet compared to state-of-the-art methods. Code is available at: \url{https://github.com/ychensu/LRANet.git}
