Table of Contents
Fetching ...

SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters

Shohei Tanaka, Atsushi Hashimoto, Yoshitaka Ushiku

TL;DR

SciPostLayoutTree addresses the need to analyze the structural organization of scientific posters by introducing a DFS-ordered tree annotation scheme over poster BBoxes and presenting a Layout Tree Decoder that fuses visual features with BBox coordinates and category embeddings. The dataset comprises about 7,851 posters, revealing frequent spatially challenging relations (upward, horizontal, long-distance) not common in document datasets. The proposed decoder uses beam search to capture sequence-level plausibility and bbox embeddings to improve parent–child predictions, achieving consistent gains across backbones in reading-order and hierarchical predictions. The work provides a solid baseline for poster-structure analysis and makes both dataset and code publicly available, enabling broader development of structure-aware poster interfaces and accessibility tools.

Abstract

Scientific posters play a vital role in academic communication by presenting ideas through visual summaries. Analyzing reading order and parent-child relations of posters is essential for building structure-aware interfaces that facilitate clear and accurate understanding of research content. Despite their prevalence in academic communication, posters remain underexplored in structural analysis research, which has primarily focused on papers. To address this gap, we constructed SciPostLayoutTree, a dataset of approximately 8,000 posters annotated with reading order and parent-child relations. Compared to an existing structural analysis dataset, SciPostLayoutTree contains more instances of spatially challenging relations, including upward, horizontal, and long-distance relations. As a solution to these challenges, we develop Layout Tree Decoder, which incorporates visual features as well as bounding box features including position and category information. The model also uses beam search to predict relations while capturing sequence-level plausibility. Experimental results demonstrate that our model improves the prediction accuracy for spatially challenging relations and establishes a solid baseline for poster structure analysis. The dataset is publicly available at https://huggingface.co/datasets/omron-sinicx/scipostlayouttree. The code is also publicly available at https://github.com/omron-sinicx/scipostlayouttree.

SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters

TL;DR

SciPostLayoutTree addresses the need to analyze the structural organization of scientific posters by introducing a DFS-ordered tree annotation scheme over poster BBoxes and presenting a Layout Tree Decoder that fuses visual features with BBox coordinates and category embeddings. The dataset comprises about 7,851 posters, revealing frequent spatially challenging relations (upward, horizontal, long-distance) not common in document datasets. The proposed decoder uses beam search to capture sequence-level plausibility and bbox embeddings to improve parent–child predictions, achieving consistent gains across backbones in reading-order and hierarchical predictions. The work provides a solid baseline for poster-structure analysis and makes both dataset and code publicly available, enabling broader development of structure-aware poster interfaces and accessibility tools.

Abstract

Scientific posters play a vital role in academic communication by presenting ideas through visual summaries. Analyzing reading order and parent-child relations of posters is essential for building structure-aware interfaces that facilitate clear and accurate understanding of research content. Despite their prevalence in academic communication, posters remain underexplored in structural analysis research, which has primarily focused on papers. To address this gap, we constructed SciPostLayoutTree, a dataset of approximately 8,000 posters annotated with reading order and parent-child relations. Compared to an existing structural analysis dataset, SciPostLayoutTree contains more instances of spatially challenging relations, including upward, horizontal, and long-distance relations. As a solution to these challenges, we develop Layout Tree Decoder, which incorporates visual features as well as bounding box features including position and category information. The model also uses beam search to predict relations while capturing sequence-level plausibility. Experimental results demonstrate that our model improves the prediction accuracy for spatially challenging relations and establishes a solid baseline for poster structure analysis. The dataset is publicly available at https://huggingface.co/datasets/omron-sinicx/scipostlayouttree. The code is also publicly available at https://github.com/omron-sinicx/scipostlayouttree.

Paper Structure

This paper contains 40 sections, 7 equations, 23 figures, 18 tables, 1 algorithm.

Figures (23)

  • Figure 1: Example from SciPostLayoutTree. Each arrow denotes a parent-child relation, with the tail indicating the parent and the head indicating the child. The node labeled "Root" denotes the root of the DFS-ordered tree, corresponding to the poster. The number shown next to each BBox category indicates its reading order priority.
  • Figure 2: Distributions of tree depth, tree width, and number of children per node. SciPostLayoutTree (blue) and DocHieNet (orange-hatched) are displayed as stacked bars. All values are shown on a $\log_2(1 + \text{count})$ scale. Figures (a) and (b) show frequencies per 1,000 pages; Figure (c) shows frequencies per 1,000 nodes.
  • Figure 3: Reading order heatmaps by direction and distance. The heatmaps show results from SciPostLayoutTree (left) and DocHieNet (right), with counts normalized per 1,000 pages. Distances are binned on a $\log_2$ scale, and heatmap values represent $\log_2(1 + \text{count})$. The numbers below each direction indicate the mean count per page.
  • Figure 4: Parent-child relation heatmaps by direction and distance. The format follows Fig. \ref{['fig:ro_rose_heatmap']}.
  • Figure 5: Overview of Layout Tree Decoder. We extend DRGG by incorporating BBox features and beam search.
  • ...and 18 more figures