Table of Contents
Fetching ...

LPLC: A Dataset for License Plate Legibility Classification

Lucas Wojcik, Gabriel E. Lima, Valfride Nascimento, Eduil Nascimento, Rayson Laroca, David Menotti

TL;DR

This work tackles the gap between image quality and LP legibility in ALPR by introducing the LPLC dataset, a large, finely annotated collection designed to classify LP legibility and guide selective image preprocessing. It benchmarks three image classifiers (ResNet, ViT, YOLO-cls) across multiple legibility-based tasks and evaluates several SR methods, finding that most models struggle to distinguish legible from illegible LPs and that SR often fails to improve, or even harms, recognition. The results highlight the need for better legibility-aware preprocessing strategies and cross-domain SR models, and the dataset provides a public platform for advancing LP legibility classification and related ALPR tasks. Overall, the paper establishes a challenging benchmark and demonstrates the complexities of improving LP recognition through preprocessing alone.

Abstract

Automatic License Plate Recognition (ALPR) faces a major challenge when dealing with illegible license plates (LPs). While reconstruction methods such as super-resolution (SR) have emerged, the core issue of recognizing these low-quality LPs remains unresolved. To optimize model performance and computational efficiency, image pre-processing should be applied selectively to cases that require enhanced legibility. To support research in this area, we introduce a novel dataset comprising 10,210 images of vehicles with 12,687 annotated LPs for legibility classification (the LPLC dataset). The images span a wide range of vehicle types, lighting conditions, and camera/image quality levels. We adopt a fine-grained annotation strategy that includes vehicle- and LP-level occlusions, four legibility categories (perfect, good, poor, and illegible), and character labels for three categories (excluding illegible LPs). As a benchmark, we propose a classification task using three image recognition networks to determine whether an LP image is good enough, requires super-resolution, or is completely unrecoverable. The overall F1 score, which remained below 80% for all three baseline models (ViT, ResNet, and YOLO), together with the analyses of SR and LP recognition methods, highlights the difficulty of the task and reinforces the need for further research. The proposed dataset is publicly available at https://github.com/lmlwojcik/lplc-dataset.

LPLC: A Dataset for License Plate Legibility Classification

TL;DR

This work tackles the gap between image quality and LP legibility in ALPR by introducing the LPLC dataset, a large, finely annotated collection designed to classify LP legibility and guide selective image preprocessing. It benchmarks three image classifiers (ResNet, ViT, YOLO-cls) across multiple legibility-based tasks and evaluates several SR methods, finding that most models struggle to distinguish legible from illegible LPs and that SR often fails to improve, or even harms, recognition. The results highlight the need for better legibility-aware preprocessing strategies and cross-domain SR models, and the dataset provides a public platform for advancing LP legibility classification and related ALPR tasks. Overall, the paper establishes a challenging benchmark and demonstrates the complexities of improving LP recognition through preprocessing alone.

Abstract

Automatic License Plate Recognition (ALPR) faces a major challenge when dealing with illegible license plates (LPs). While reconstruction methods such as super-resolution (SR) have emerged, the core issue of recognizing these low-quality LPs remains unresolved. To optimize model performance and computational efficiency, image pre-processing should be applied selectively to cases that require enhanced legibility. To support research in this area, we introduce a novel dataset comprising 10,210 images of vehicles with 12,687 annotated LPs for legibility classification (the LPLC dataset). The images span a wide range of vehicle types, lighting conditions, and camera/image quality levels. We adopt a fine-grained annotation strategy that includes vehicle- and LP-level occlusions, four legibility categories (perfect, good, poor, and illegible), and character labels for three categories (excluding illegible LPs). As a benchmark, we propose a classification task using three image recognition networks to determine whether an LP image is good enough, requires super-resolution, or is completely unrecoverable. The overall F1 score, which remained below 80% for all three baseline models (ViT, ResNet, and YOLO), together with the analyses of SR and LP recognition methods, highlights the difficulty of the task and reinforces the need for further research. The proposed dataset is publicly available at https://github.com/lmlwojcik/lplc-dataset.

Paper Structure

This paper contains 7 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Examples illustrating the distinction between image quality and lp legibility. High-quality images may contain illegible lp (top right), while low-quality images can still include legible ones (bottom right). A single image may also feature both legible and illegible lp (left).
  • Figure 2: Samples from the dataset dataset.
  • Figure 3: OCR legibility levels.
  • Figure 4: Cross-fold Splits Illustration.
  • Figure 5: Confusion matrix for one run of YOLO-cls.
  • ...and 1 more figures