RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Avinash Anand; Raj Jaiswal; Mohit Gupta; Siddhesh S Bangar; Pijush Bhuyan; Naman Lal; Rajeev Singh; Ritika Jha; Rajiv Ratn Shah; Shin'ichi Satoh

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Avinash Anand, Raj Jaiswal, Mohit Gupta, Siddhesh S Bangar, Pijush Bhuyan, Naman Lal, Rajeev Singh, Ritika Jha, Rajiv Ratn Shah, Shin'ichi Satoh

TL;DR

This work addresses the challenge of domain shift in document layout detection by introducing RanLayNet, a synthetic, automatically labeled dataset that diversifies layout configurations through composite image generation and noisy-label augmentation. The authors train and evaluate a YOLOv8-based layout detector across PubLayNet, IIIT-AR-13K, Doclaynet, and RanLayNet, demonstrating that models trained with RanLayNet achieve competitive or superior cross-domain performance, including a $mAP_{95}$ of up to $0.588$ for the TABLE class in scientific documents. The approach reduces labeling burden while improving robustness to domain variability, suggesting strong potential for domain adaptation and generalization in real-world document understanding tasks. Overall, RanLayNet offers a versatile data-generation paradigm that enhances cross-domain layout recognition without reliance on extensive manual annotations.

Abstract

Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these models function. To solve this problem, domain adaptation approaches have been developed that use a small quantity of labeled data to adjust the model to the target domain. In this research, we introduced a synthetic document dataset called RanLayNet, enriched with automatically assigned labels denoting spatial positions, ranges, and types of layout elements. The primary aim of this endeavor is to develop a versatile dataset capable of training models with robustness and adaptability to diverse document formats. Through empirical experimentation, we demonstrate that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents. Moreover, we conduct a comparative analysis by fine-tuning inference models using both PubLayNet and IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that models enriched with our dataset are optimal for tasks such as achieving 0.398 and 0.588 mAP95 score in the scientific document domain for the TABLE class.

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

TL;DR

of up to

for the TABLE class in scientific documents. The approach reduces labeling burden while improving robustness to domain variability, suggesting strong potential for domain adaptation and generalization in real-world document understanding tasks. Overall, RanLayNet offers a versatile data-generation paradigm that enhances cross-domain layout recognition without reliance on extensive manual annotations.

Abstract

Paper Structure (13 sections, 6 figures, 7 tables)

This paper contains 13 sections, 6 figures, 7 tables.

Introduction
Related Work
Datasets
Publaynet
IIIT-AR-13K
Doclaynet
RanLayNet
Methodology
Experiments
Conclusion
Future Scope
Acknowledgement
Appendix

Figures (6)

Figure 1: RanLayNet Dataset Sample
Figure 2: Doclaynet Results on Fine-tuned RanLayNet model
Figure 3: Pipeline of RanLayNet generation. Crops of PubLayNet are randomly pasted on white canvas on the basis of remaining spaces on the canvas.
Figure 4: YOLOv8 Training/Validation Loss Curves for IIIT-AR-13k Dataset
Figure 5: YOLOv8 Training/Validation Loss Curves for PubLayNet
...and 1 more figures

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

TL;DR

Abstract

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Authors

TL;DR

Abstract

Table of Contents

Figures (6)