Table of Contents
Fetching ...

ICDAR 2023 Competition on Hierarchical Text Detection and Recognition

Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michalis Raptis

TL;DR

The paper presents the ICDAR 2023 Competition on Hierarchical Text Detection and Recognition (HTDR), which formalizes a unified task combining text detection, recognition, and geometric layout analysis. It introduces the HierText dataset and defines two tracks—hierarchical text detection and word-level end-to-end recognition—with PQ-based hierarchy evaluation and F1-based end-to-end evaluation. Results show substantial gains over the prior Unified Detector, driven by multi-head/post-processing hierarchical methods and two-stage pipelines, while end-to-end approaches lag behind in this setting. The work offers practical insights into dataset design, evaluation, and the tradeoffs between hierarchical detection and end-to-end recognition, and outlines future directions including multilingual data and expanded HTDR benchmarks.

Abstract

We organize a competition on hierarchical text detection and recognition. The competition is aimed to promote research into deep learning models and systems that can jointly perform text detection and recognition and geometric layout analysis. We present details of the proposed competition organization, including tasks, datasets, evaluations, and schedule. During the competition period (from January 2nd 2023 to April 1st 2023), at least 50 submissions from more than 20 teams were made in the 2 proposed tasks. Considering the number of teams and submissions, we conclude that the HierText competition has been successfully held. In this report, we will also present the competition results and insights from them.

ICDAR 2023 Competition on Hierarchical Text Detection and Recognition

TL;DR

The paper presents the ICDAR 2023 Competition on Hierarchical Text Detection and Recognition (HTDR), which formalizes a unified task combining text detection, recognition, and geometric layout analysis. It introduces the HierText dataset and defines two tracks—hierarchical text detection and word-level end-to-end recognition—with PQ-based hierarchy evaluation and F1-based end-to-end evaluation. Results show substantial gains over the prior Unified Detector, driven by multi-head/post-processing hierarchical methods and two-stage pipelines, while end-to-end approaches lag behind in this setting. The work offers practical insights into dataset design, evaluation, and the tradeoffs between hierarchical detection and end-to-end recognition, and outlines future directions including multilingual data and expanded HTDR benchmarks.

Abstract

We organize a competition on hierarchical text detection and recognition. The competition is aimed to promote research into deep learning models and systems that can jointly perform text detection and recognition and geometric layout analysis. We present details of the proposed competition organization, including tasks, datasets, evaluations, and schedule. During the competition period (from January 2nd 2023 to April 1st 2023), at least 50 submissions from more than 20 teams were made in the 2 proposed tasks. Considering the number of teams and submissions, we conclude that the HierText competition has been successfully held. In this report, we will also present the competition results and insights from them.
Paper Structure (16 sections, 1 equation, 8 figures, 2 tables)

This paper contains 16 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Illustration for the proposed unified task: Hierarchical Text Detection and Recognition (HTDR). Given an input image, the unified model is expected to produce a hierarchical text representation, which resembles the form of a forest. Each tree in the forest represents one paragraph and has three layers, representing the clustering of words into lines and then paragraphs.
  • Figure 2: Example of hierarchical annotation format of the dataset.
  • Figure 3: Illustration for the hierarchical annotation of text in images. From left to right: word, line, paragraph level annotations. Words (blue) are annotated with polygons. Lines (green) and paragraphs (yellow) are annotated as hierarchical clusters and visualized as polygons. Images are taken from the train split.
  • Figure 4: Illustration of how hierarchical text detection can be evaluated as $3$ instance segmentation sub-tasks. The coloring of each column indicates the instance segmentation for each sub-task.
  • Figure 5: Character set in the training split.
  • ...and 3 more figures