Efficient Document Image Dewarping via Hybrid Deep Learning and Cubic Polynomial Geometry Restoration
Valery Istomin, Oleg Pereziabov, Ilya Afanasyev
TL;DR
This work addresses the challenge of geometric distortions in camera-captured documents that degrade OCR quality. It proposes a hybrid pipeline that uses YOLOv8 for robust document detection and classical computer vision to build a topology-preserving grid via cubic polynomial interpolation and bicubic remapping, achieving efficient and accurate dewarping. The approach outperforms both mobile scanning apps and pure deep learning methods across OCR and geometry restoration metrics, with CER=$0.0235$, LD=$27.8$, and JW=$0.902$, while remaining CPU-friendly and memory-light. An open-source DRCCBI framework and a 392-image annotated dataset enable reproducibility and further research, underscoring the practical impact for resource-constrained digitization workflows.
Abstract
Camera-captured document images often suffer from geometric distortions caused by paper deformation, perspective distortion, and lens aberrations, significantly reducing OCR accuracy. This study develops an efficient automated method for document image dewarping that balances accuracy with computational efficiency. We propose a hybrid approach combining deep learning for document detection with classical computer vision for geometry restoration. YOLOv8 performs initial document segmentation and mask generation. Subsequently, classical CV techniques construct a topological 2D grid through cubic polynomial interpolation of document boundaries, followed by image remapping to correct nonlinear distortions. A new annotated dataset and open-source framework are provided to facilitate reproducibility and further research. Experimental evaluation against state-of-the-art methods (RectiNet, DocGeoNet, DocTr++) and mobile applications (DocScan, CamScanner, TapScanner) demonstrates superior performance. Our method achieves the lowest median Character Error Rate (CER=0.0235), Levenshtein Distance (LD=27.8), and highest Jaro--Winkler similarity (JW=0.902), approaching the quality of scanned originals. The approach requires significantly fewer computational resources and memory compared to pure deep learning solutions while delivering better OCR readability and geometry restoration quality. The proposed hybrid methodology effectively restores document geometry with computational efficiency superior to existing deep learning approaches, making it suitable for resource-constrained applications while maintaining high-quality document digitization. Project page: https://github.com/HorizonParadox/DRCCBI
