LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting

Yuchen Su; Zhineng Chen; Yongkun Du; Zuxuan Wu; Hongtao Xie; Yu-Gang Jiang

LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting

Yuchen Su, Zhineng Chen, Yongkun Du, Zuxuan Wu, Hongtao Xie, Yu-Gang Jiang

TL;DR

LRANet++ tackles the challenge of accurate and efficient end-to-end text spotting for arbitrary-shaped text by introducing a data-driven low-rank approximation (LRA) to model text contours and a triple assignment detection head that decouples learning from inference. The LRA uses a robust Fast Median Subspace to derive orthonormal orthanchors, enabling compact, stable contour representations; the triple assignment head combines a deep sparse teacher, a dense auxiliary, and a shallow sparse student to preserve accuracy while accelerating inference. A Transformer-based, TPS-aligned recognition head enables efficient end-to-end transcription via CTC decoding, aided by large-ratio image scaling to mitigate RoI-induced distortions. Extensive experiments across CTW1500, Total-Text, and multilingual benchmarks demonstrate state-of-the-art end-to-end F-measures and real-time speeds, validating the approach’s practical impact for robust, scalable text understanding in natural scenes.

Abstract

End-to-end text spotting aims to jointly optimize text detection and recognition within a unified framework. Despite significant progress, designing an accurate and efficient end-to-end text spotter for arbitrary-shaped text remains largely unsolved. We identify the primary bottleneck as the lack of a reliable and efficient text detection method. To address this, we propose a novel parameterized text shape method based on low-rank approximation for precise detection and a triple assignment detection head to enable fast inference. Specifically, unlike other shape representation methods that employ data-irrelevant parameterization, our data-driven approach derives a low-rank subspace directly from labeled text boundaries. To ensure this process is robust against the inherent annotation noise in this data, we utilize a specialized recovery method based on an $\ell_1$-norm formulation, which accurately reconstructs the text shape with only a few key orthogonal vectors. By exploiting the inherent shape correlation among different text contours, our method achieves consistency and compactness in shape representation. Next, the triple assignment scheme introduces a novel architecture where a deep sparse branch (for stabilized training) is used to guide the learning of an ultra-lightweight sparse branch (for accelerated inference), while a dense branch provides rich parallel supervision. Building upon these advancements, we integrate the enhanced detection module with a lightweight recognition branch to form an end-to-end text spotting framework, termed LRANet++, capable of accurately and efficiently spotting arbitrary-shaped text. Extensive experiments on several challenging benchmarks demonstrate the superiority of LRANet++ compared to state-of-the-art methods. Code will be available at: https://github.com/ychensu/LRANet-PP.git

LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting

TL;DR

Abstract

LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)