Table of Contents
Fetching ...

A Lightweight Multi-Module Fusion Approach for Korean Character Recognition

Inho Jake Park, Jaehoon Jay Jeong, Ho-Sang Jo

TL;DR

SDA-Net targets robust single-character Korean OCR by combining Stroke-Sensitive Attention with edge-aware spatial cues, a Dynamic Context Encoding module, and a U‑Net–inspired skip fusion in a lightweight ResNet-based backbone. The model achieves high accuracy in challenging conditions while maintaining low memory and compute demands (5.6M parameters, 3.4 GFLOPs), enabling real-time edge deployment. A novel ACPSLD dataset for license-plate–level single-character recognition supports real-world evaluation, including strict KNPA-aligned correctness metrics and on-site testing with notable performance. Ablation studies demonstrate the incremental gains from SSA, edge attention, and DCE, underscoring the value of integrated attention, context refinement, and multi-scale fusion for robust OCR. The work advances practical OCR for traffic surveillance by delivering accurate, efficient recognition under diverse, real-world conditions.

Abstract

Optical Character Recognition (OCR) is essential in applications such as document processing, license plate recognition, and intelligent surveillance. However, existing OCR models often underperform in real-world scenarios due to irregular text layouts, poor image quality, character variability, and high computational costs. This paper introduces SDA-Net (Stroke-Sensitive Attention and Dynamic Context Encoding Network), a lightweight and efficient architecture designed for robust single-character recognition. SDA-Net incorporates: (1) a Dual Attention Mechanism to enhance stroke-level and spatial feature extraction; (2) a Dynamic Context Encoding module that adaptively refines semantic information using a learnable gating mechanism; (3) a U-Net-inspired Feature Fusion Strategy for combining low-level and high-level features; and (4) a highly optimized lightweight backbone that reduces memory and computational demands. Experimental results show that SDA-Net achieves state-of-the-art accuracy on challenging OCR benchmarks, with significantly faster inference, making it well-suited for deployment in real-time and edge-based OCR systems.

A Lightweight Multi-Module Fusion Approach for Korean Character Recognition

TL;DR

SDA-Net targets robust single-character Korean OCR by combining Stroke-Sensitive Attention with edge-aware spatial cues, a Dynamic Context Encoding module, and a U‑Net–inspired skip fusion in a lightweight ResNet-based backbone. The model achieves high accuracy in challenging conditions while maintaining low memory and compute demands (5.6M parameters, 3.4 GFLOPs), enabling real-time edge deployment. A novel ACPSLD dataset for license-plate–level single-character recognition supports real-world evaluation, including strict KNPA-aligned correctness metrics and on-site testing with notable performance. Ablation studies demonstrate the incremental gains from SSA, edge attention, and DCE, underscoring the value of integrated attention, context refinement, and multi-scale fusion for robust OCR. The work advances practical OCR for traffic surveillance by delivering accurate, efficient recognition under diverse, real-world conditions.

Abstract

Optical Character Recognition (OCR) is essential in applications such as document processing, license plate recognition, and intelligent surveillance. However, existing OCR models often underperform in real-world scenarios due to irregular text layouts, poor image quality, character variability, and high computational costs. This paper introduces SDA-Net (Stroke-Sensitive Attention and Dynamic Context Encoding Network), a lightweight and efficient architecture designed for robust single-character recognition. SDA-Net incorporates: (1) a Dual Attention Mechanism to enhance stroke-level and spatial feature extraction; (2) a Dynamic Context Encoding module that adaptively refines semantic information using a learnable gating mechanism; (3) a U-Net-inspired Feature Fusion Strategy for combining low-level and high-level features; and (4) a highly optimized lightweight backbone that reduces memory and computational demands. Experimental results show that SDA-Net achieves state-of-the-art accuracy on challenging OCR benchmarks, with significantly faster inference, making it well-suited for deployment in real-time and edge-based OCR systems.

Paper Structure

This paper contains 39 sections, 18 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Proposed SDA-Net architecture
  • Figure 2: Example license plate types and formats in the ACPSLD dataset.
  • Figure 3: Evaluation process for license plate recognition using real-time CCTV feed.
  • Figure 4: Various license plates from on-site locations
  • Figure 5: Examples of difficult cases in the ACPSLD dataset, including dust-covered plates, occluded characters, and motorcycle license plates.