Table of Contents
Fetching ...

Unfolder: Fast localization and image rectification of a document with a crease from folding in half

A. M. Ershov, D. V. Tropin, E. E. Limonova, D. P. Nikolaev, V. V. Arlazarov

TL;DR

This work proposes a novel approach Unfolder, a robust to projective distortions of the document image and does not fragment the image in the vicinity of a crease after rectification, which is better than the advanced neural network methods DocTr and DewarpNet.

Abstract

Presentation of folded documents is not an uncommon case in modern society. Digitizing such documents by capturing them with a smartphone camera can be tricky since a crease can divide the document contents into separate planes. To unfold the document, one could hold the edges potentially obscuring it in a captured image. While there are many geometrical rectification methods, they were usually developed for arbitrary bends and folds. We consider such algorithms and propose a novel approach Unfolder developed specifically for images of documents with a crease from folding in half. Unfolder is robust to projective distortions of the document image and does not fragment the image in the vicinity of a crease after rectification. A new Folded Document Images dataset was created to investigate the rectification accuracy of folded (2, 3, 4, and 8 folds) documents. The dataset includes 1600 images captured when document placed on a table and when held in hand. The Unfolder algorithm allowed for a recognition error rate of 0.33, which is better than the advanced neural network methods DocTr (0.44) and DewarpNet (0.57). The average runtime for Unfolder was only 0.25 s/image on an iPhone XR.

Unfolder: Fast localization and image rectification of a document with a crease from folding in half

TL;DR

This work proposes a novel approach Unfolder, a robust to projective distortions of the document image and does not fragment the image in the vicinity of a crease after rectification, which is better than the advanced neural network methods DocTr and DewarpNet.

Abstract

Presentation of folded documents is not an uncommon case in modern society. Digitizing such documents by capturing them with a smartphone camera can be tricky since a crease can divide the document contents into separate planes. To unfold the document, one could hold the edges potentially obscuring it in a captured image. While there are many geometrical rectification methods, they were usually developed for arbitrary bends and folds. We consider such algorithms and propose a novel approach Unfolder developed specifically for images of documents with a crease from folding in half. Unfolder is robust to projective distortions of the document image and does not fragment the image in the vicinity of a crease after rectification. A new Folded Document Images dataset was created to investigate the rectification accuracy of folded (2, 3, 4, and 8 folds) documents. The dataset includes 1600 images captured when document placed on a table and when held in hand. The Unfolder algorithm allowed for a recognition error rate of 0.33, which is better than the advanced neural network methods DocTr (0.44) and DewarpNet (0.57). The average runtime for Unfolder was only 0.25 s/image on an iPhone XR.
Paper Structure (6 equations, 8 figures, 2 tables)

This paper contains 6 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: (a) Input image, (b) rectified image with a content tearing (marked in orange) caused by inappropriate image stitching, (c) enlarged region.
  • Figure 2: Schematic structure of the Unfolder algorithm
  • Figure 3: (a) Horizontal (green) and vertical (red) edges on the "merged" edge map, (b) detected lines (red and green) and the blue line splitting the image into halves, (c) a pair of quadrilaterals (the top one is depicted with blue, the bottom one with yellow), the true crease line (green) and crease line computed by vertical segments intersection (red), (d) a polyline $ABCD$ approximating a boundary fracture on the path graph marked by a dotted rectangle on (a), (e) detected hexangle, (f) edges lying along the detected hexangle, (g) rectified image.
  • Figure 4: Examples of a continuous ($H_1$, $H_2$) and discontinuous ($H_3$, $H_2$) mapping.
  • Figure 5: Examples of a rejected and accepted rectification: (a) the input sample image with a detected hexangle (blue) and its correction (red), which is rejected, (b) the rectification of this image by the red hexangle, (c) the input sample image with a detected hexangle (blue) and its correction (green), which is accepted, (d) the rectification of this image by the green hexangle.
  • ...and 3 more figures