Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System

Majid Memari; Khaled R. Ahmed; Shahram Rahimi; Noorbakhsh Amiri Golilarz

Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System

Majid Memari, Khaled R. Ahmed, Shahram Rahimi, Noorbakhsh Amiri Golilarz

TL;DR

The paper tackles the challenge of objectively evaluating generative models for realistic Arabic handwritten digits to boost OCR performance. It introduces LFID, a low-dimensional Fréchet distance, alongside a Synthetic Image Evaluation Procedure to monitor image quality and enable early stopping during training. Through comparative experiments on C-GAN and C-VAE using AHDD, the study finds that C-VAE generally yields OCR gains and faster training, while C-GAN produces sharper images with limited OCR benefit; LFID better predicts downstream OCR improvements than the traditional $FID$. Saliency maps confirm that C-VAE focuses on discriminative digit features, supporting robust digit recognition. Overall, the LFID-based framework provides a practical, real-time evaluation and data-augmentation approach that advances OCR for complex scripts and offers a benchmark for future generative-model evaluation in OCR contexts.

Abstract

This research addresses a critical challenge in the field of generative models, particularly in the generation and evaluation of synthetic images. Given the inherent complexity of generative models and the absence of a standardized procedure for their comparison, our study introduces a pioneering algorithm to objectively assess the realism of synthetic images. This approach significantly enhances the evaluation methodology by refining the Fréchet Inception Distance (FID) score, allowing for a more precise and subjective assessment of image quality. Our algorithm is particularly tailored to address the challenges in generating and evaluating realistic images of Arabic handwritten digits, a task that has traditionally been near-impossible due to the subjective nature of realism in image generation. By providing a systematic and objective framework, our method not only enables the comparison of different generative models but also paves the way for improvements in their design and output. This breakthrough in evaluation and comparison is crucial for advancing the field of OCR, especially for scripts that present unique complexities, and sets a new standard in the generation and assessment of high-quality synthetic images.

Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System

TL;DR

. Saliency maps confirm that C-VAE focuses on discriminative digit features, supporting robust digit recognition. Overall, the LFID-based framework provides a practical, real-time evaluation and data-augmentation approach that advances OCR for complex scripts and offers a benchmark for future generative-model evaluation in OCR contexts.

Abstract

Paper Structure (51 sections, 25 equations, 16 figures, 2 tables, 3 algorithms)

This paper contains 51 sections, 25 equations, 16 figures, 2 tables, 3 algorithms.

Introduction
Problem Statement
Research Objectives
Scope and Significance
Structure of the Paper
Literature Review
Challenges in Realistic Image Generation
Current Evaluation Methods
Evaluation Metrics for Generative Models
Classifier-based Metrics
Divergence-based Metrics
Methodology
Research Design
Dataset
Synthetic Image Generation
...and 36 more sections

Figures (16)

Figure 1: Evaluation Metrics for Generative Models
Figure 2: Arabic Handwritten Digits Dataset (AHDD)
Figure 3: Conditional GAN
Figure 4: C-GAN Architecture
Figure 5: C-VAE Architecture
...and 11 more figures

Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System

TL;DR

Abstract

Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System

Authors

TL;DR

Abstract

Table of Contents

Figures (16)