Table of Contents
Fetching ...

Generalizable Multiscale Segmentation of Heterogeneous Map Collections

Remi Petitpierre

TL;DR

This article introduces Semap, a new open benchmark dataset comprising 1,439 manually annotated patches designed to reflect the variety of historical map documents, and presents a segmentation framework that combines procedural data synthesis with multiscale integration to improve robustness and transferability.

Abstract

Historical map collections are highly diverse in style, scale, and geographic focus, often consisting of many single-sheet documents. Yet most work in map recognition focuses on specialist models tailored to homogeneous map series. In contrast, this article aims to develop generalizable semantic segmentation models and ontology. First, we introduce Semap, a new open benchmark dataset comprising 1,439 manually annotated patches designed to reflect the variety of historical map documents. Second, we present a segmentation framework that combines procedural data synthesis with multiscale integration to improve robustness and transferability. This framework achieves state-of-the-art performance on both the HCMSSD and Semap datasets, showing that a diversity-driven approach to map recognition is not only viable but also beneficial. The results indicate that segmentation performance remains largely stable across map collections, scales, geographic regions, and publication contexts. By proposing benchmark datasets and methods for the generic segmentation of historical maps, this work opens the way to integrating the long tail of cartographic archives to historical geographic studies.

Generalizable Multiscale Segmentation of Heterogeneous Map Collections

TL;DR

This article introduces Semap, a new open benchmark dataset comprising 1,439 manually annotated patches designed to reflect the variety of historical map documents, and presents a segmentation framework that combines procedural data synthesis with multiscale integration to improve robustness and transferability.

Abstract

Historical map collections are highly diverse in style, scale, and geographic focus, often consisting of many single-sheet documents. Yet most work in map recognition focuses on specialist models tailored to homogeneous map series. In contrast, this article aims to develop generalizable semantic segmentation models and ontology. First, we introduce Semap, a new open benchmark dataset comprising 1,439 manually annotated patches designed to reflect the variety of historical map documents. Second, we present a segmentation framework that combines procedural data synthesis with multiscale integration to improve robustness and transferability. This framework achieves state-of-the-art performance on both the HCMSSD and Semap datasets, showing that a diversity-driven approach to map recognition is not only viable but also beneficial. The results indicate that segmentation performance remains largely stable across map collections, scales, geographic regions, and publication contexts. By proposing benchmark datasets and methods for the generic segmentation of historical maps, this work opens the way to integrating the long tail of cartographic archives to historical geographic studies.
Paper Structure (24 sections, 5 equations, 15 figures, 9 tables)

This paper contains 24 sections, 5 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: Manually annotated training samples. Real map crops (top) with corresponding labels (bottom). The label color code follows the legend of Figure \ref{['fig:2']}.
  • Figure 2: Semantic masks retrieved from MapTiler API, displayed as a function of the zoom level.
  • Figure 3: Synthetically generated training samples. Synthetic map images (top) and corresponding labels (bottom). The label color code follows the legend of Figure \ref{['fig:2']}.
  • Figure 4: Confusion matrix based on Semap test set predictions. The matrix is computed as the average over all test samples (micro-average), normalized per class, (a) w.r.t. predictions or (b) w.r.t to the ground-truth. The diagonal values correspond to (a) precision, and (b) recall.
  • Figure 5: Standardized effect of map-metadata variables on mIoU. Vertical bars indicate multivariate linear ordinary least-squares regression coefficients, with 95% CI. The dependent variable, mIoU, denotes the per-patch (n=801) average intersection over union standardized within each dataset partition. No systematic bias is detected, apart from a slight positive effect for maps covering locations in Indonesia or Turkey. Overall, the regression explains little variance ($R^{2} = 0.04$), suggesting stable segmentation performance across evaluated metadata classes.
  • ...and 10 more figures