Table of Contents
Fetching ...

Cascaded Robust Rectification for Arbitrary Document Images

Chaoyun Wang, Quanxin Huang, I-Chao Shen, Takeo Igarashi, Nanning Zheng, Caigui Jiang

TL;DR

This paper introduces a cascaded, three-stage rectification framework (L-Net for perspective, C-Net for geometry, F-Net for content) that progressively corrects arbitrary document distortions in a coarse-to-fine manner. It leverages canonical view normalization, an adaptive iterative refinement, and a principled loss design to achieve state-of-the-art results across multiple benchmarks. To address evaluation weaknesses, it proposes layout-aligned OCR metrics (AED/ACER) and masked geometric metrics (AD-M/AAD-M) that decouple rectification quality from OCR layouts and background regions. The approach demonstrates robust performance, efficiency, and practical applicability, with detailed ablations and comparisons to commercial tools, and outlines future work toward multi-view reconstruction to overcome single-view limitations.

Abstract

Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions. Driven by the insight that complex transformations can be decomposed and resolved progressively, we introduce a novel multi-stage framework that progressively reverses distinct distortion types in a coarse-to-fine manner. Specifically, our framework first performs a global affine transformation to correct perspective distortions arising from the camera's viewpoint, then rectifies geometric deformations resulting from physical paper curling and folding, and finally employs a content-aware iterative process to eliminate fine-grained content distortions. To address limitations in existing evaluation protocols, we also propose two enhanced metrics: layout-aligned OCR metrics (AED/ACER) for a stable assessment that decouples geometric rectification quality from the layout analysis errors of OCR engines, and masked AD/AAD (AD-M/AAD-M) tailored for accurately evaluating geometric distortions in documents with incomplete boundaries. Extensive experiments show that our method establishes new state-of-the-art performance on multiple challenging benchmarks, yielding a substantial reduction of 14.1\%--34.7\% in the AAD metric and demonstrating superior efficacy in real-world applications. The code will be publicly available at https://github.com/chaoyunwang/ArbDR.

Cascaded Robust Rectification for Arbitrary Document Images

TL;DR

This paper introduces a cascaded, three-stage rectification framework (L-Net for perspective, C-Net for geometry, F-Net for content) that progressively corrects arbitrary document distortions in a coarse-to-fine manner. It leverages canonical view normalization, an adaptive iterative refinement, and a principled loss design to achieve state-of-the-art results across multiple benchmarks. To address evaluation weaknesses, it proposes layout-aligned OCR metrics (AED/ACER) and masked geometric metrics (AD-M/AAD-M) that decouple rectification quality from OCR layouts and background regions. The approach demonstrates robust performance, efficiency, and practical applicability, with detailed ablations and comparisons to commercial tools, and outlines future work toward multi-view reconstruction to overcome single-view limitations.

Abstract

Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions. Driven by the insight that complex transformations can be decomposed and resolved progressively, we introduce a novel multi-stage framework that progressively reverses distinct distortion types in a coarse-to-fine manner. Specifically, our framework first performs a global affine transformation to correct perspective distortions arising from the camera's viewpoint, then rectifies geometric deformations resulting from physical paper curling and folding, and finally employs a content-aware iterative process to eliminate fine-grained content distortions. To address limitations in existing evaluation protocols, we also propose two enhanced metrics: layout-aligned OCR metrics (AED/ACER) for a stable assessment that decouples geometric rectification quality from the layout analysis errors of OCR engines, and masked AD/AAD (AD-M/AAD-M) tailored for accurately evaluating geometric distortions in documents with incomplete boundaries. Extensive experiments show that our method establishes new state-of-the-art performance on multiple challenging benchmarks, yielding a substantial reduction of 14.1\%--34.7\% in the AAD metric and demonstrating superior efficacy in real-world applications. The code will be publicly available at https://github.com/chaoyunwang/ArbDR.

Paper Structure

This paper contains 44 sections, 10 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: A systematic classification of arbitrarily distorted documents by type and distortion category. The highlighted red dotted box indicates the ideal cases targeted by current research.
  • Figure 2: Architecture of the adaptive cascaded rectification framework. (a) Our pipeline reverses document distortions in a coarse-to-fine sequence: the L-Net corrects global perspective distortion, the C-Net rectifies coarse shape distortions, and the F-Net performs adaptive iterative refinement of content-level distortions. During inference, these specialist transformations are composed into a single backward mapping from the rectified output to the original input, ensuring high fidelity by minimizing resampling. (b) The effect of canonical view normalization. Our initial affine transformation converts diverse, challenging inputs (left) into a standardized domain (right), simplifying the task for subsequent networks.
  • Figure 3: The geometric principle for our stopping condition. First, axis-aligned reference lines (green) are filtered from all detected lines (blue and green). These reference lines are then transformed by the iterative deformation field, and their final alignment is quantified as the line entropy score (Eq. \ref{['eq:align_entropy']}).
  • Figure 4: Comparison of OCR evaluation pipelines. (a) The conventional method vs. (b) our proposed layout-aligned method.
  • Figure 5: Comparison of the calculation process for standard AD/AAD (green box) and our proposed masked AD-M/AAD-M (red box). (a) The components used in the calculation. (b) Heat map corresponding to the calculation metrics.
  • ...and 4 more figures