Table of Contents
Fetching ...

Efficient Document Image Dewarping via Hybrid Deep Learning and Cubic Polynomial Geometry Restoration

Valery Istomin, Oleg Pereziabov, Ilya Afanasyev

TL;DR

This work addresses the challenge of geometric distortions in camera-captured documents that degrade OCR quality. It proposes a hybrid pipeline that uses YOLOv8 for robust document detection and classical computer vision to build a topology-preserving grid via cubic polynomial interpolation and bicubic remapping, achieving efficient and accurate dewarping. The approach outperforms both mobile scanning apps and pure deep learning methods across OCR and geometry restoration metrics, with CER=$0.0235$, LD=$27.8$, and JW=$0.902$, while remaining CPU-friendly and memory-light. An open-source DRCCBI framework and a 392-image annotated dataset enable reproducibility and further research, underscoring the practical impact for resource-constrained digitization workflows.

Abstract

Camera-captured document images often suffer from geometric distortions caused by paper deformation, perspective distortion, and lens aberrations, significantly reducing OCR accuracy. This study develops an efficient automated method for document image dewarping that balances accuracy with computational efficiency. We propose a hybrid approach combining deep learning for document detection with classical computer vision for geometry restoration. YOLOv8 performs initial document segmentation and mask generation. Subsequently, classical CV techniques construct a topological 2D grid through cubic polynomial interpolation of document boundaries, followed by image remapping to correct nonlinear distortions. A new annotated dataset and open-source framework are provided to facilitate reproducibility and further research. Experimental evaluation against state-of-the-art methods (RectiNet, DocGeoNet, DocTr++) and mobile applications (DocScan, CamScanner, TapScanner) demonstrates superior performance. Our method achieves the lowest median Character Error Rate (CER=0.0235), Levenshtein Distance (LD=27.8), and highest Jaro--Winkler similarity (JW=0.902), approaching the quality of scanned originals. The approach requires significantly fewer computational resources and memory compared to pure deep learning solutions while delivering better OCR readability and geometry restoration quality. The proposed hybrid methodology effectively restores document geometry with computational efficiency superior to existing deep learning approaches, making it suitable for resource-constrained applications while maintaining high-quality document digitization. Project page: https://github.com/HorizonParadox/DRCCBI

Efficient Document Image Dewarping via Hybrid Deep Learning and Cubic Polynomial Geometry Restoration

TL;DR

This work addresses the challenge of geometric distortions in camera-captured documents that degrade OCR quality. It proposes a hybrid pipeline that uses YOLOv8 for robust document detection and classical computer vision to build a topology-preserving grid via cubic polynomial interpolation and bicubic remapping, achieving efficient and accurate dewarping. The approach outperforms both mobile scanning apps and pure deep learning methods across OCR and geometry restoration metrics, with CER=, LD=, and JW=, while remaining CPU-friendly and memory-light. An open-source DRCCBI framework and a 392-image annotated dataset enable reproducibility and further research, underscoring the practical impact for resource-constrained digitization workflows.

Abstract

Camera-captured document images often suffer from geometric distortions caused by paper deformation, perspective distortion, and lens aberrations, significantly reducing OCR accuracy. This study develops an efficient automated method for document image dewarping that balances accuracy with computational efficiency. We propose a hybrid approach combining deep learning for document detection with classical computer vision for geometry restoration. YOLOv8 performs initial document segmentation and mask generation. Subsequently, classical CV techniques construct a topological 2D grid through cubic polynomial interpolation of document boundaries, followed by image remapping to correct nonlinear distortions. A new annotated dataset and open-source framework are provided to facilitate reproducibility and further research. Experimental evaluation against state-of-the-art methods (RectiNet, DocGeoNet, DocTr++) and mobile applications (DocScan, CamScanner, TapScanner) demonstrates superior performance. Our method achieves the lowest median Character Error Rate (CER=0.0235), Levenshtein Distance (LD=27.8), and highest Jaro--Winkler similarity (JW=0.902), approaching the quality of scanned originals. The approach requires significantly fewer computational resources and memory compared to pure deep learning solutions while delivering better OCR readability and geometry restoration quality. The proposed hybrid methodology effectively restores document geometry with computational efficiency superior to existing deep learning approaches, making it suitable for resource-constrained applications while maintaining high-quality document digitization. Project page: https://github.com/HorizonParadox/DRCCBI
Paper Structure (26 sections, 7 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 26 sections, 7 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: Flowchart of the document geometry restoration and dewarping algorithm: 1) Document mask detection with YOLOv8; 2) Contour edge detection and segmentation into document sides; 3) Construction of a topological 2D grid via interpolation with cubic polynomials; 4) Detection of intersection points, grid formation, transformation mapping, and image remapping.
  • Figure 2: Example input images from the test set of 15 camera-captured documents with diverse geometric distortions (e.g., perspective, curvature, folds), used to evaluate mobile scanning applications: DocScan, CamScanner, and TapScanner.
  • Figure 3: The visual comparative analysis of documents reconstructed by our algorithm and popular desktop DL models - DocTr++, DocGeoNet and RectiNet.
  • Figure 4: Visual comparative analysis of documents reconstructed by the proposed algorithm and popular deep learning models (DocTr++, DocGeoNet, RectiNet) on an additional dataset. Multiple documents with diverse geometric distortions, paper colors, and lighting conditions demonstrate consistent superior boundary precision and topology preservation of our method, resulting in sharper text edges and minimal spurious artifacts compared to all alternatives.