A comparative analysis of SRGAN models
Fatemeh Rezapoor Nikroo, Ajinkya Deshmukh, Anantha Sharma, Adrian Tam, Kaarthik Kumar, Cleo Norris, Aditya Dangi
TL;DR
This work addresses OCR reliability on degraded real-world text by evaluating four SRGAN-based models (EDSR, EDSR-BASE, ESRGAN, Real-ESRGAN) within a degradation-to-SR-to-OCR pipeline using PSNR/SSIM and Tesseract OCR. The study clarifies the architectural differences among generator and discriminator designs across SRGAN variants, including RRDB blocks, residual scaling, and relativistic discriminators, and contrasts non-GAN CNN-based SR (EDSR/EDSR-BASE). Empirical results indicate EDSR-BASE delivers the best balance of high quantitative image quality and robust OCR accuracy with lower compute, while Real-ESRGAN can excel for complex textures at higher computational cost; ESRGAN variants are less effective on real-world degradations. The findings provide practical guidance for selecting SR methods when high-fidelity text recognition is the primary objective, and they highlight potential avenues for evaluating OCR robustness with alternative engines and degradations.
Abstract
In this study, we evaluate the performance of multiple state-of-the-art SRGAN (Super Resolution Generative Adversarial Network) models, ESRGAN, Real-ESRGAN and EDSR, on a benchmark dataset of real-world images which undergo degradation using a pipeline. Our results show that some models seem to significantly increase the resolution of the input images while preserving their visual quality, this is assessed using Tesseract OCR engine. We observe that EDSR-BASE model from huggingface outperforms the remaining candidate models in terms of both quantitative metrics and subjective visual quality assessments with least compute overhead. Specifically, EDSR generates images with higher peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) values and are seen to return high quality OCR results with Tesseract OCR engine. These findings suggest that EDSR is a robust and effective approach for single-image super-resolution and may be particularly well-suited for applications where high-quality visual fidelity is critical and optimized compute.
