Table of Contents
Fetching ...

Accurate Fine-grained Layout Analysis for the Historical Tibetan Document Based on the Instance Segmentation

Penghai Zhao, Weilan Wang, Zhengqi Cai, Guowei Zhang, Yuqi Lu

TL;DR

In general, this preliminary research provides insights into the fine-grained sub-line level layout analysis and testifies the SOLOv2-based approaches.

Abstract

Accurate layout analysis without subsequent text-line segmentation remains an ongoing challenge, especially when facing the Kangyur, a kind of historical Tibetan document featuring considerable touching components and mottled background. Aiming at identifying different regions in document images, layout analysis is indispensable for subsequent procedures such as character recognition. However, there was only a little research being carried out to perform line-level layout analysis which failed to deal with the Kangyur. To obtain the optimal results, a fine-grained sub-line level layout analysis approach is presented. Firstly, we introduced an accelerated method to build the dataset which is dynamic and reliable. Secondly, enhancement had been made to the SOLOv2 according to the characteristics of the Kangyur. Then, we fed the enhanced SOLOv2 with the prepared annotation file during the training phase. Once the network is trained, instances of the text line, sentence, and titles can be segmented and identified during the inference stage. The experimental results show that the proposed method delivers a decent 72.7% average precision on our dataset. In general, this preliminary research provides insights into the fine-grained sub-line level layout analysis and testifies the SOLOv2-based approaches. We also believe that the proposed methods can be adopted on other language documents with various layouts.

Accurate Fine-grained Layout Analysis for the Historical Tibetan Document Based on the Instance Segmentation

TL;DR

In general, this preliminary research provides insights into the fine-grained sub-line level layout analysis and testifies the SOLOv2-based approaches.

Abstract

Accurate layout analysis without subsequent text-line segmentation remains an ongoing challenge, especially when facing the Kangyur, a kind of historical Tibetan document featuring considerable touching components and mottled background. Aiming at identifying different regions in document images, layout analysis is indispensable for subsequent procedures such as character recognition. However, there was only a little research being carried out to perform line-level layout analysis which failed to deal with the Kangyur. To obtain the optimal results, a fine-grained sub-line level layout analysis approach is presented. Firstly, we introduced an accelerated method to build the dataset which is dynamic and reliable. Secondly, enhancement had been made to the SOLOv2 according to the characteristics of the Kangyur. Then, we fed the enhanced SOLOv2 with the prepared annotation file during the training phase. Once the network is trained, instances of the text line, sentence, and titles can be segmented and identified during the inference stage. The experimental results show that the proposed method delivers a decent 72.7% average precision on our dataset. In general, this preliminary research provides insights into the fine-grained sub-line level layout analysis and testifies the SOLOv2-based approaches. We also believe that the proposed methods can be adopted on other language documents with various layouts.

Paper Structure

This paper contains 12 sections, 1 equation, 15 figures, 4 tables, 1 algorithm.

Figures (15)

  • Figure 1: Normal procedures of DAR include traditional approach (top) and deep learning-based approach (bottom)
  • Figure 2: Three features of the historical Tibetan document image: stains (black box), closed or touching strokes (purple and blue box), faded strokes (pink box), and excessive space between sentences (orange box)
  • Figure 3: Results of different Mask R-CNN and annotations: (a) annotations; (b) results using vanilla Mask R-CNN; (c) results using improved Mask R-CNN
  • Figure 4: The proposed backbone: fraction denotes the feature map size comparing to the input image
  • Figure 5: The improved SOLOv2 with the use of HRFPN and ResNeXt
  • ...and 10 more figures