Table of Contents
Fetching ...

Comics Datasets Framework: Mix of Comics datasets for detection benchmarking

Emanuele Vivoli, Irene Campaioli, Mariateresa Nardoni, Niccolò Biondi, Marco Bertini, Dimosthenis Karatzas

TL;DR

The paper tackles reproducibility and comparability gaps in comics object detection by introducing the Comics Datasets Framework (CDF). It standardizes annotations via the Unified Comics Annotation (UCA) format, consolidates multiple datasets (including the newly curated Comics100) to balance manga and American styles, and provides a pipeline for converting data to common formats (CVAT/COCO) and for standardized evaluation. It benchmarks a range of detectors, from convolutional models (Faster R-CNN, SSD, YOLO) to zero-shot and transformer-based approaches (GroundingDINO, DASS, Magi), under consistent train/test splits and metrics. The framework and accompanying resources (code, weights) enable fair comparisons and reproducible experimentation, aiming to clarify the comics research landscape and support future multi-modal tasks requiring precise object recognition.

Abstract

Comics, as a medium, uniquely combine text and images in styles often distinct from real-world visuals. For the past three decades, computational research on comics has evolved from basic object detection to more sophisticated tasks. However, the field faces persistent challenges such as small datasets, inconsistent annotations, inaccessible model weights, and results that cannot be directly compared due to varying train/test splits and metrics. To address these issues, we aim to standardize annotations across datasets, introduce a variety of comic styles into the datasets, and establish benchmark results with clear, replicable settings. Our proposed Comics Datasets Framework standardizes dataset annotations into a common format and addresses the overrepresentation of manga by introducing Comics100, a curated collection of 100 books from the Digital Comics Museum, annotated for detection in our uniform format. We have benchmarked a variety of detection architectures using the Comics Datasets Framework. All related code, model weights, and detailed evaluation processes are available at https://github.com/emanuelevivoli/cdf, ensuring transparency and facilitating replication. This initiative is a significant advancement towards improving object detection in comics, laying the groundwork for more complex computational tasks dependent on precise object recognition.

Comics Datasets Framework: Mix of Comics datasets for detection benchmarking

TL;DR

The paper tackles reproducibility and comparability gaps in comics object detection by introducing the Comics Datasets Framework (CDF). It standardizes annotations via the Unified Comics Annotation (UCA) format, consolidates multiple datasets (including the newly curated Comics100) to balance manga and American styles, and provides a pipeline for converting data to common formats (CVAT/COCO) and for standardized evaluation. It benchmarks a range of detectors, from convolutional models (Faster R-CNN, SSD, YOLO) to zero-shot and transformer-based approaches (GroundingDINO, DASS, Magi), under consistent train/test splits and metrics. The framework and accompanying resources (code, weights) enable fair comparisons and reproducible experimentation, aiming to clarify the comics research landscape and support future multi-modal tasks requiring precise object recognition.

Abstract

Comics, as a medium, uniquely combine text and images in styles often distinct from real-world visuals. For the past three decades, computational research on comics has evolved from basic object detection to more sophisticated tasks. However, the field faces persistent challenges such as small datasets, inconsistent annotations, inaccessible model weights, and results that cannot be directly compared due to varying train/test splits and metrics. To address these issues, we aim to standardize annotations across datasets, introduce a variety of comic styles into the datasets, and establish benchmark results with clear, replicable settings. Our proposed Comics Datasets Framework standardizes dataset annotations into a common format and addresses the overrepresentation of manga by introducing Comics100, a curated collection of 100 books from the Digital Comics Museum, annotated for detection in our uniform format. We have benchmarked a variety of detection architectures using the Comics Datasets Framework. All related code, model weights, and detailed evaluation processes are available at https://github.com/emanuelevivoli/cdf, ensuring transparency and facilitating replication. This initiative is a significant advancement towards improving object detection in comics, laying the groundwork for more complex computational tasks dependent on precise object recognition.
Paper Structure (15 sections, 5 equations, 3 figures, 8 tables)

This paper contains 15 sections, 5 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Unification pipeline schema: given a dataset in an origin format, through a specialized adapter we obtain the XML unified format. This can be converted to CVAT, COCO, or any format required.
  • Figure 2: Dataset composition.
  • Figure 3: Number of annotation types per dataset.