Table of Contents
Fetching ...

SMOL-MapSeg: Show Me One Label as prompt

Yunshuang Yuan, Frank Thiemann, Thorsten Dahms, Monika Sester

TL;DR

This work addresses the challenge of segmenting historical maps whose symbols vary across collections by introducing OND-knowledge-based prompting, a method that grounds segmentation in explicit image–label examples. SMOL-MapSeg, a SAM-based architecture enhanced with Weight-Decomposed Low-Rank Adaptation (DoRA), accepts two images (source with a label and a target map) and uses a dedicated Prompt Encoder to encode OND knowledge, enabling class-aware segmentation across arbitrary datasets. The approach achieves superior performance compared with baseline models, demonstrates strong generalization with limited data, and supports few-shot adaptation to entirely new classes, indicating wide applicability for scalable historical-map analysis. Limitations include difficulties with classes lacking distinctive local cues, such as certain water features or railways, pointing toward future work in incorporating global context and multi-resolution training to improve robustness.

Abstract

Historical maps offer valuable insights into changes on Earth's surface but pose challenges for modern segmentation models due to inconsistent visual styles and symbols. While deep learning models such as UNet and pre-trained foundation models perform well in domains like autonomous driving and medical imaging, they struggle with the variability of historical maps, where similar concepts appear in diverse forms. To address this issue, we propose On-Need Declarative (OND) knowledge-based prompting, a method that provides explicit image-label pair prompts to guide models in linking visual patterns with semantic concepts. This enables users to define and segment target concepts on demand, supporting flexible, concept-aware segmentation. Our approach replaces the prompt encoder of the Segment Anything Model (SAM) with the OND prompting mechanism and fine-tunes it on historical maps, creating SMOL-MapSeg (Show Me One Label). Unlike existing SAM-based fine-tuning methods that are class-agnostic or restricted to fixed classes, SMOL-MapSeg supports class-aware segmentation across arbitrary datasets. Experiments show that SMOL-MapSeg accurately segments user-defined classes and substantially outperforms baseline models. Furthermore, it demonstrates strong generalization even with minimal training data, highlighting its potential for scalable and adaptable historical map analysis.

SMOL-MapSeg: Show Me One Label as prompt

TL;DR

This work addresses the challenge of segmenting historical maps whose symbols vary across collections by introducing OND-knowledge-based prompting, a method that grounds segmentation in explicit image–label examples. SMOL-MapSeg, a SAM-based architecture enhanced with Weight-Decomposed Low-Rank Adaptation (DoRA), accepts two images (source with a label and a target map) and uses a dedicated Prompt Encoder to encode OND knowledge, enabling class-aware segmentation across arbitrary datasets. The approach achieves superior performance compared with baseline models, demonstrates strong generalization with limited data, and supports few-shot adaptation to entirely new classes, indicating wide applicability for scalable historical-map analysis. Limitations include difficulties with classes lacking distinctive local cues, such as certain water features or railways, pointing toward future work in incorporating global context and multi-resolution training to improve robustness.

Abstract

Historical maps offer valuable insights into changes on Earth's surface but pose challenges for modern segmentation models due to inconsistent visual styles and symbols. While deep learning models such as UNet and pre-trained foundation models perform well in domains like autonomous driving and medical imaging, they struggle with the variability of historical maps, where similar concepts appear in diverse forms. To address this issue, we propose On-Need Declarative (OND) knowledge-based prompting, a method that provides explicit image-label pair prompts to guide models in linking visual patterns with semantic concepts. This enables users to define and segment target concepts on demand, supporting flexible, concept-aware segmentation. Our approach replaces the prompt encoder of the Segment Anything Model (SAM) with the OND prompting mechanism and fine-tunes it on historical maps, creating SMOL-MapSeg (Show Me One Label). Unlike existing SAM-based fine-tuning methods that are class-agnostic or restricted to fixed classes, SMOL-MapSeg supports class-aware segmentation across arbitrary datasets. Experiments show that SMOL-MapSeg accurately segments user-defined classes and substantially outperforms baseline models. Furthermore, it demonstrates strong generalization even with minimal training data, highlighting its potential for scalable and adaptable historical map analysis.

Paper Structure

This paper contains 26 sections, 5 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: The concept of segmenting images based on a newly provided labeling example (OND Knowledge). In historical maps, different maps may use different visual patterns to represent the same class (e.g., Map 1 and Map 2 using different patterns for Class A), or conversely, the same pattern to represent different classes (e.g., Class A in Map 1 and Class B in Map 2). Overlapping patterns may introduce additional complexity. This inconsistency can confuse a conventional semantic segmentation model. By prompting the model with OND knowledge—which specifies “what something is” in the current context—this ambiguity can be effectively resolved (see illustration on the right).
  • Figure 2: Examples of segmentation results of historical maps by SAM and CLIPSeg.
  • Figure 3: Overview of the proposed SMOL-MapSeg framework. Given a labeled source image and an unlabeled target image, both are processed through a shared Image Encoder (adapted through DoRA). The source image features and label are further encoded via the Prompt Encoder to capture OND knowledge. These prompt features, combined with target image features, are passed to the Mask Decoder, which predicts the segmentation mask in the target image based on the visual pattern indicated in the source label.
  • Figure 4: Segmentation results of SMOL-MapSeg (Columns with green box in the bottom) and MapSAM (Columns with orange box in the bottom) on Siegfried datasets.
  • Figure 5: Segmentation results of SMOL-MapSeg on Hameln dataset.
  • ...and 5 more figures