License Plate Images Generation with Diffusion Models

Mariia Shpir; Nadiya Shvai; Amir Nakib

License Plate Images Generation with Diffusion Models

Mariia Shpir, Nadiya Shvai, Amir Nakib

TL;DR

This work tackles data scarcity in license plate recognition caused by privacy regulations by training a diffusion model (DDPM) on Ukrainian plates to synthesize realistic LP images. The authors generate 1,000 synthetic plates for detailed analysis and release a 10,000-image synthetic Ukrainian LP dataset to enable broader LPR research, using pseudolabeling to scale training data. They validate the approach with thorough analyses of readability, symbol and regional distributions, and an LPR task evaluation showing improvements of about 3 percentage points over baselines when synthetic data is included. The findings demonstrate the practical viability of diffusion-based data augmentation for LPR and provide a valuable resource for future research and benchmarking in GDPR-constrained settings.

Abstract

Despite the evident practical importance of license plate recognition (LPR), corresponding research is limited by the volume of publicly available datasets due to privacy regulations such as the General Data Protection Regulation (GDPR). To address this challenge, synthetic data generation has emerged as a promising approach. In this paper, we propose to synthesize realistic license plates (LPs) using diffusion models, inspired by recent advances in image and video generation. In our experiments a diffusion model was successfully trained on a Ukrainian LP dataset, and 1000 synthetic images were generated for detailed analysis. Through manual classification and annotation of the generated images, we performed a thorough study of the model output, such as success rate, character distributions, and type of failures. Our contributions include experimental validation of the efficacy of diffusion models for LP synthesis, along with insights into the characteristics of the generated data. Furthermore, we have prepared a synthetic dataset consisting of 10,000 LP images, publicly available at https://zenodo.org/doi/10.5281/zenodo.13342102. Conducted experiments empirically confirm the usefulness of synthetic data for the LPR task. Despite the initial performance gap between the model trained with real and synthetic data, the expansion of the training data set with pseudolabeled synthetic data leads to an improvement in LPR accuracy by 3% compared to baseline.

License Plate Images Generation with Diffusion Models

TL;DR

Abstract

Paper Structure (29 sections, 10 figures, 4 tables)

This paper contains 29 sections, 10 figures, 4 tables.

Introduction
Related Work
License Plate Generation
License Plate Datasets
Experiments, Results and Discussion
Experimental Setup
Dataset
Standardization of Ukrainian Vehicle License Plate Codes.
Dataset Details.
Generative Model Training
LP Generation Results
Successful Image Generation Criteria.
Analysis of Image Generation Success and Failure Distribution.
Quantitative Generation Quality Metrics.
Character Distribution Analysis
...and 14 more sections

Figures (10)

Figure 1: License plate formats and dataset samples. The top section shows the format types of Ukrainian LPs used in this study, and the bottom section provides corresponding sample images from the dataset.
Figure 2: Visualization of the generated license plate images at different training stages.
Figure 3: Examples of successful LP image generation.
Figure 4: Examples of failed LP image generation.
Figure 5: Distribution of image categories for synthetic (left) and real (right) LP images.
...and 5 more figures

License Plate Images Generation with Diffusion Models

TL;DR

Abstract

License Plate Images Generation with Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)