Towards Khmer Scene Document Layout Detection

Marry Kong; Rina Buoy; Sovisal Chenda; Nguonly Taing; Masakazu Iwamura; Koichi Kise

Towards Khmer Scene Document Layout Detection

Marry Kong, Rina Buoy, Sovisal Chenda, Nguonly Taing, Masakazu Iwamura, Koichi Kise

TL;DR

This paper presents a novel framework comprising three key elements: a robust training and benchmarking dataset specifically for Khmer scene layouts; an open-source document augmentation tool capable of synthesizing realistic scene documents to scale training data; and layout detection baselines utilizing YOLO-based architectures with oriented bounding boxes (OBB) to handle geometric distortions.

Abstract

While document layout analysis for Latin scripts has advanced significantly, driven by the advent of large multimodal models (LMMs), progress for the Khmer language remains constrained because of the scarcity of annotated training data. This gap is particularly acute for scene documents, where perspective distortions and complex backgrounds challenge traditional methods. Given the structural complexities of Khmer script, such as diacritics and multi-layer character stacking, existing Latin-based layout analysis models fail to accurately delineate semantic layout units, particularly for dense text regions (e.g., list items). In this paper, we present the first comprehensive study on Khmer scene document layout detection. We contribute a novel framework comprising three key elements: (1) a robust training and benchmarking dataset specifically for Khmer scene layouts; (2) an open-source document augmentation tool capable of synthesizing realistic scene documents to scale training data; and (3) layout detection baselines utilizing YOLO-based architectures with oriented bounding boxes (OBB) to handle geometric distortions. To foster further research in the Khmer document analysis and recognition (DAR) community, we release our models, code, and datasets in this gated repository (in review).

Towards Khmer Scene Document Layout Detection

TL;DR

Abstract

Paper Structure (15 sections, 7 figures, 6 tables)

This paper contains 15 sections, 7 figures, 6 tables.

Introduction
Related Work
Latin Document Layout Analysis
Khmer Document Layout Analysis
Methodology
Dataset Construction
Layout Augmentation
Training Khmer Scene DLA Models
Experimental Setup
Results and Discussion
Layout Detection Performance
Layout Detection Performance Comparison with the Existing Methods
Qualitative Evaluation
Limitations and Future Work
Conclusions

Figures (7)

Figure 1: Sample Khmer text layout (with permission buoy2025addressing.) Blue: consonant subscript. Green: base consonant. Orange: dependent vowel. Purple: diacritic. Best viewed in color.
Figure 2: The overall methodology of this study.
Figure 3: A few sample annotated images with human curations.
Figure 4: A few sample extremely augmented cases with corrupted bounding boxes.
Figure 5: A few sample augmented images with augmented annotations.
...and 2 more figures

Towards Khmer Scene Document Layout Detection

TL;DR

Abstract

Towards Khmer Scene Document Layout Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)