Table of Contents
Fetching ...

Axis-Aligned Document Dewarping

Chaoyun Wang, I-Chao Shen, Takeo Igarashi, Caigui Jiang

TL;DR

This work tackles document dewarping by exploiting an axis-aligned property: the defining feature lines of a rectified document align with the coordinate axes. It introduces an axis-aligned geometric constraint for training, an axis-alignment preprocessing step for inference, and a new Axis-Aligned Distortion (AAD) metric for evaluation, all integrated into a grid-based dewarping network that predicts both 2D unwarping and 3D grid meshes. The approach achieves state-of-the-art performance on benchmarks, notably boosting AAD by 18.2% to 34.5% and improving OCR reliability, while ablations confirm complementary benefits from both AL and AP components. By grounding learning in intrinsic document geometry, this principle-driven strategy offers robust dewarping across varied distortions and lays groundwork for extending axis-aligned priors to other rectification tasks.

Abstract

Document dewarping is crucial for many applications. However, existing learning-based methods rely heavily on supervised regression with annotated data without fully leveraging the inherent geometric properties of physical documents. Our key insight is that a well-dewarped document is defined by its axis-aligned feature lines. This property aligns with the inherent axis-aligned nature of the discrete grid geometry in planar documents. Harnessing this property, we introduce three synergistic contributions: for the training phase, we propose an axis-aligned geometric constraint to enhance document dewarping; for the inference phase, we propose an axis alignment preprocessing strategy to reduce the dewarping difficulty; and for the evaluation phase, we introduce a new metric, Axis-Aligned Distortion (AAD), that not only incorporates geometric meaning and aligns with human visual perception but also demonstrates greater robustness. As a result, our method achieves state-of-the-art performance on multiple existing benchmarks, improving the AAD metric by 18.2% to 34.5%. The code is publicly available at https://github.com/chaoyunwang/AADD.

Axis-Aligned Document Dewarping

TL;DR

This work tackles document dewarping by exploiting an axis-aligned property: the defining feature lines of a rectified document align with the coordinate axes. It introduces an axis-aligned geometric constraint for training, an axis-alignment preprocessing step for inference, and a new Axis-Aligned Distortion (AAD) metric for evaluation, all integrated into a grid-based dewarping network that predicts both 2D unwarping and 3D grid meshes. The approach achieves state-of-the-art performance on benchmarks, notably boosting AAD by 18.2% to 34.5% and improving OCR reliability, while ablations confirm complementary benefits from both AL and AP components. By grounding learning in intrinsic document geometry, this principle-driven strategy offers robust dewarping across varied distortions and lays groundwork for extending axis-aligned priors to other rectification tasks.

Abstract

Document dewarping is crucial for many applications. However, existing learning-based methods rely heavily on supervised regression with annotated data without fully leveraging the inherent geometric properties of physical documents. Our key insight is that a well-dewarped document is defined by its axis-aligned feature lines. This property aligns with the inherent axis-aligned nature of the discrete grid geometry in planar documents. Harnessing this property, we introduce three synergistic contributions: for the training phase, we propose an axis-aligned geometric constraint to enhance document dewarping; for the inference phase, we propose an axis alignment preprocessing strategy to reduce the dewarping difficulty; and for the evaluation phase, we introduce a new metric, Axis-Aligned Distortion (AAD), that not only incorporates geometric meaning and aligns with human visual perception but also demonstrates greater robustness. As a result, our method achieves state-of-the-art performance on multiple existing benchmarks, improving the AAD metric by 18.2% to 34.5%. The code is publicly available at https://github.com/chaoyunwang/AADD.

Paper Structure

This paper contains 24 sections, 13 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Our research motivation and main contributions. (a) A warped document (left) and its target rectified version (right). The key characteristic of the rectified version is the alignment of its feature lines (highlighted by dotted lines) with the axes. We term this the "axis-aligned property". (b) Inspired by (a), we integrate this axis-aligned property into the training, inference, and evaluation stages of our deep learning method.
  • Figure 2: Axis-aligned geometric constraints were enforced during document dewarping network training. The top row shows the predicted 2D unwarping grid (obtained from Grid-Net in UVDoc verhoeven2023uvdoc), and the bottom row displays its transformation into UV space based on the ground truth, facilitating the computation of the corresponding axis-aligned geometric constraints loss along horizontal and vertical directions.
  • Figure 3: Illustration of the dewarping inference process with axis alignment preprocessing. The red grid represents the predicted 2D unwarping grid, while the blue rectangle indicates the minimum-area rotated rectangle computed from it.
  • Figure 4: Example of visualizing AAD metrics. The first row is the ground truth image and its gradient heatmap in the horizontal ($G_{y}$) and vertical ($G_{x}$) directions; the second and third rows correspond to the two different dewarped results and their overlay heatmaps for AAD and its horizontal (AAD_H) and vertical (AAD_V) components, the $\leftrightarrow$ and $\updownarrow$ arrows indicate the directions for visualizing the axis-aligned error of the dewarped feature lines.
  • Figure 5: Visualization and comparison of the AD and AAD metrics on a example. (a) The ground truth image (top) and its corresponding gradient map (bottom). (b) Dewarped results from verhoeven2023uvdoc (top) and our method (bottom). (c) Heatmaps of the AD metric for the results shown in (b). (d) Corresponding heatmaps of our proposed AAD metric.
  • ...and 10 more figures