Axis-Aligned Document Dewarping
Chaoyun Wang, I-Chao Shen, Takeo Igarashi, Caigui Jiang
TL;DR
This work tackles document dewarping by exploiting an axis-aligned property: the defining feature lines of a rectified document align with the coordinate axes. It introduces an axis-aligned geometric constraint for training, an axis-alignment preprocessing step for inference, and a new Axis-Aligned Distortion (AAD) metric for evaluation, all integrated into a grid-based dewarping network that predicts both 2D unwarping and 3D grid meshes. The approach achieves state-of-the-art performance on benchmarks, notably boosting AAD by 18.2% to 34.5% and improving OCR reliability, while ablations confirm complementary benefits from both AL and AP components. By grounding learning in intrinsic document geometry, this principle-driven strategy offers robust dewarping across varied distortions and lays groundwork for extending axis-aligned priors to other rectification tasks.
Abstract
Document dewarping is crucial for many applications. However, existing learning-based methods rely heavily on supervised regression with annotated data without fully leveraging the inherent geometric properties of physical documents. Our key insight is that a well-dewarped document is defined by its axis-aligned feature lines. This property aligns with the inherent axis-aligned nature of the discrete grid geometry in planar documents. Harnessing this property, we introduce three synergistic contributions: for the training phase, we propose an axis-aligned geometric constraint to enhance document dewarping; for the inference phase, we propose an axis alignment preprocessing strategy to reduce the dewarping difficulty; and for the evaluation phase, we introduce a new metric, Axis-Aligned Distortion (AAD), that not only incorporates geometric meaning and aligns with human visual perception but also demonstrates greater robustness. As a result, our method achieves state-of-the-art performance on multiple existing benchmarks, improving the AAD metric by 18.2% to 34.5%. The code is publicly available at https://github.com/chaoyunwang/AADD.
