Global License Plate Dataset
Siddharth Agrawal
TL;DR
The paper introduces the Global License Plate Dataset (GLPD), a large-scale, multinational resource with over 5 million images from 74 countries, designed to benchmark license plate recognition under real-world, diverse conditions. It details data collection primarily from Platesmania and supplementary open sources, extensive annotations (plate text, four-vertex corners, segmentation maps, and vehicle attributes), and ancillary COCO-style labels for a subset, all under a defined 60/20/20 train/validation/test split with near-duplicate avoidance using Normalised Edit Distance $NED(a,b) = \frac{dist(a,b)}{\max(len(a),len(b))}$. The paper presents end-to-end evaluation strategies, including detection via YOLOv5m and recognition with CRNN and PARSeq, reporting strong cross-country performance and highlighting PARSeq’s superior accuracy. Ethical considerations are discussed, emphasizing privacy protections and controlled sampling to mitigate bias, while stressing GLPD’s potential to improve generalization and enable country-specific fine-tuning for license plate recognition systems.
Abstract
In the pursuit of advancing the state-of-the-art (SOTA) in road safety, traffic monitoring, surveillance, and logistics automation, we introduce the Global License Plate Dataset (GLPD). The dataset consists of over 5 million images, including diverse samples captured from 74 countries with meticulous annotations, including license plate characters, license plate segmentation masks, license plate corner vertices, as well as vehicle make, colour, and model. We also include annotated data on more classes, such as pedestrians, vehicles, roads, etc. We include a statistical analysis of the dataset, and provide baseline efficient and accurate models. The GLPD aims to be the primary benchmark dataset for model development and finetuning for license plate recognition.
