Table of Contents
Fetching ...

D2Dewarp: Dual Dimensions Geometric Representation Learning Based Document Image Dewarping

Heng Li, Xiangping Wu, Qingcai Chen

TL;DR

A fine-grained deformation perception model that focuses on Dual Dimensions of document horizontal-vertical-lines to improve document Dewarping called D2Dewarp, which achieves better rectification results compared with the state-of-the-art methods.

Abstract

Document image dewarping remains a challenging task in the deep learning era. While existing methods have improved by leveraging text line awareness, they typically focus only on a single horizontal dimension. In this paper, we propose a fine-grained deformation perception model that focuses on Dual Dimensions of document horizontal-vertical-lines to improve document Dewarping called D2Dewarp. It can perceive distortion trends in different directions across document details. To combine the horizontal and vertical granularity features, an effective fusion module based on X and Y coordinate is designed to facilitate interaction and constraint between the two dimensions for feature complementarity. Due to the lack of annotated line features in current public dewarping datasets, we also propose an automatic fine-grained annotation method using public document texture images and automatic rendering engine to build a new large-scale distortion training dataset named DocDewarpHV. On three public Chinese and English benchmarks, both quantitative and qualitative results show that our method achieves better rectification results compared with the state-of-the-art methods. The code and dataset are available at https://github.com/xiaomore/D2Dewarp.

D2Dewarp: Dual Dimensions Geometric Representation Learning Based Document Image Dewarping

TL;DR

A fine-grained deformation perception model that focuses on Dual Dimensions of document horizontal-vertical-lines to improve document Dewarping called D2Dewarp, which achieves better rectification results compared with the state-of-the-art methods.

Abstract

Document image dewarping remains a challenging task in the deep learning era. While existing methods have improved by leveraging text line awareness, they typically focus only on a single horizontal dimension. In this paper, we propose a fine-grained deformation perception model that focuses on Dual Dimensions of document horizontal-vertical-lines to improve document Dewarping called D2Dewarp. It can perceive distortion trends in different directions across document details. To combine the horizontal and vertical granularity features, an effective fusion module based on X and Y coordinate is designed to facilitate interaction and constraint between the two dimensions for feature complementarity. Due to the lack of annotated line features in current public dewarping datasets, we also propose an automatic fine-grained annotation method using public document texture images and automatic rendering engine to build a new large-scale distortion training dataset named DocDewarpHV. On three public Chinese and English benchmarks, both quantitative and qualitative results show that our method achieves better rectification results compared with the state-of-the-art methods. The code and dataset are available at https://github.com/xiaomore/D2Dewarp.

Paper Structure

This paper contains 9 sections, 6 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Visualization of our proposed dataset DocDewarpHV. The first column is the distorted document image with complex background. 3D Coordinate: the position of each pixel of the distorted image in three-dimensional space. UV map: maps the 3D surface to the location of the 2D coordinate system texture map. Horizontal Lines: in addition to text lines, its also include the top and bottom boundaries of documents, tables, figures, and paragraphs. Vertical Lines: the left and right boundaries of the deformed document area, tables, figures, and paragraphs.
  • Figure 2: The architecture of our proposed method D2Dewarp. The segmentation model of the UNet structure predicts the two dimensions of lines. The dual decoders share the same encoder. Each layer in the decoder outputs the prediction result of the line, and then the feature map of each layer is resized to one-eighth of the input image and concatenated to obtain $F_h$ and $F_v$ respectively. The HV Fusion Module is used to fuse the feature maps of horizontal and vertical lines. For better visualization, we omit the skip connection line of UNet in this figure.
  • Figure 3: The HV Fusion Module. $\boldsymbol{F}_h$ and $\boldsymbol{F}_v$ represent the horizontal and vertical feature maps obtained by the segmentation model as the input of this module. X and Y Pool are $AvgPool$ in Equation (\ref{['equal:avgpool']}) using $AdaptiveAvgPool2d$. $Sig.$ refers to Sigmoid activation function. $C$ denotes concatenation. Arrows in red and blue indicate the pathways of feature fusion in the X and Y directions.
  • Figure 4: Qualitative visualization comparison between our proposed and existing methods. The first column is the input distorted document image, and the last column is the dewarping result of our method. The middle columns are the effects of other previous methods. Colored arrows and dashed lines highlight differences.
  • Figure 5: Qualitative visual comparison of local rectification in the horizontal direction between our proposed D2Dewarp and existing methods, including DewarpNet Das2019DewarpNetSD, DocGeoNet Feng2022GeometricRL, PaperEdge Ma2022LearningFD, DocScanner feng2025docscanner, and LA-DocFlatten Li2023LayoutawareSD. The results demonstrate the superior capability of our method in correcting horizontal distortions. In the figure, "Distorted" represents the input warped image, while "Ground Truth" corresponds to the flattened reference. Red boxes mark local regions and green dashed lines show differences.
  • ...and 2 more figures