A Lightweight Multi-Module Fusion Approach for Korean Character Recognition
Inho Jake Park, Jaehoon Jay Jeong, Ho-Sang Jo
TL;DR
SDA-Net targets robust single-character Korean OCR by combining Stroke-Sensitive Attention with edge-aware spatial cues, a Dynamic Context Encoding module, and a U‑Net–inspired skip fusion in a lightweight ResNet-based backbone. The model achieves high accuracy in challenging conditions while maintaining low memory and compute demands (5.6M parameters, 3.4 GFLOPs), enabling real-time edge deployment. A novel ACPSLD dataset for license-plate–level single-character recognition supports real-world evaluation, including strict KNPA-aligned correctness metrics and on-site testing with notable performance. Ablation studies demonstrate the incremental gains from SSA, edge attention, and DCE, underscoring the value of integrated attention, context refinement, and multi-scale fusion for robust OCR. The work advances practical OCR for traffic surveillance by delivering accurate, efficient recognition under diverse, real-world conditions.
Abstract
Optical Character Recognition (OCR) is essential in applications such as document processing, license plate recognition, and intelligent surveillance. However, existing OCR models often underperform in real-world scenarios due to irregular text layouts, poor image quality, character variability, and high computational costs. This paper introduces SDA-Net (Stroke-Sensitive Attention and Dynamic Context Encoding Network), a lightweight and efficient architecture designed for robust single-character recognition. SDA-Net incorporates: (1) a Dual Attention Mechanism to enhance stroke-level and spatial feature extraction; (2) a Dynamic Context Encoding module that adaptively refines semantic information using a learnable gating mechanism; (3) a U-Net-inspired Feature Fusion Strategy for combining low-level and high-level features; and (4) a highly optimized lightweight backbone that reduces memory and computational demands. Experimental results show that SDA-Net achieves state-of-the-art accuracy on challenging OCR benchmarks, with significantly faster inference, making it well-suited for deployment in real-time and edge-based OCR systems.
