Table of Contents
Fetching ...

Map Feature Perception Metric for Map Generation Quality Assessment and Loss Optimization

Chenxing Sun, Jing Bai

TL;DR

The paper addresses the challenge of evaluating and optimizing the authenticity of synthetically generated maps beyond pixel-level fidelity. It introduces the Map Feature Perception Metric (MFP), which leverages DINO-ViT/ViT-based deep features to capture global structure and spatial relationships through a dual loss comprising global feature and spatial semantic components, with $\lambda_1 = 10$ and $\lambda_2 = 1$. Empirical results show that incorporating MFP as a loss yields improvements across GAN and diffusion-model baselines on maps datasets, with gains ranging from 2% to 50% relative to traditional losses, and diffusion models generally outperforming GANs. The work demonstrates that explicitly modeling cartographic global attributes and spatial coherence enhances geographic plausibility, supported by qualitative visualizations and validity analyses that align with perceptual map quality and practical applications.

Abstract

In intelligent cartographic generation tasks empowered by generative models, the authenticity of synthesized maps constitutes a critical determinant. Concurrently, the selection of appropriate evaluation metrics to quantify map authenticity emerges as a pivotal research challenge. Current methodologies predominantly adopt computer vision-based image assessment metrics to compute discrepancies between generated and reference maps. However, conventional visual similarity metrics-including L1, L2, SSIM, and FID-primarily operate at pixel-level comparisons, inadequately capturing cartographic global features and spatial correlations, consequently inducing semantic-structural artifacts in generated outputs. This study introduces a novel Map Feature Perception Metric designed to evaluate global characteristics and spatial congruence between synthesized and target maps. Diverging from pixel-wise metrics, our approach extracts elemental-level deep features that comprehensively encode cartographic structural integrity and topological relationships. Experimental validation demonstrates MFP's superior capability in evaluating cartographic semantic features, with classification-enhanced implementations outperforming conventional loss functions across diverse generative frameworks. When employed as optimization objectives, our metric achieves performance gains ranging from 2% to 50% across multiple benchmarks compared to traditional L1, L2, and SSIM baselines. This investigation concludes that explicit consideration of cartographic global attributes and spatial coherence substantially enhances generative model optimization, thereby significantly improving the geographical plausibility of synthesized maps.

Map Feature Perception Metric for Map Generation Quality Assessment and Loss Optimization

TL;DR

The paper addresses the challenge of evaluating and optimizing the authenticity of synthetically generated maps beyond pixel-level fidelity. It introduces the Map Feature Perception Metric (MFP), which leverages DINO-ViT/ViT-based deep features to capture global structure and spatial relationships through a dual loss comprising global feature and spatial semantic components, with and . Empirical results show that incorporating MFP as a loss yields improvements across GAN and diffusion-model baselines on maps datasets, with gains ranging from 2% to 50% relative to traditional losses, and diffusion models generally outperforming GANs. The work demonstrates that explicitly modeling cartographic global attributes and spatial coherence enhances geographic plausibility, supported by qualitative visualizations and validity analyses that align with perceptual map quality and practical applications.

Abstract

In intelligent cartographic generation tasks empowered by generative models, the authenticity of synthesized maps constitutes a critical determinant. Concurrently, the selection of appropriate evaluation metrics to quantify map authenticity emerges as a pivotal research challenge. Current methodologies predominantly adopt computer vision-based image assessment metrics to compute discrepancies between generated and reference maps. However, conventional visual similarity metrics-including L1, L2, SSIM, and FID-primarily operate at pixel-level comparisons, inadequately capturing cartographic global features and spatial correlations, consequently inducing semantic-structural artifacts in generated outputs. This study introduces a novel Map Feature Perception Metric designed to evaluate global characteristics and spatial congruence between synthesized and target maps. Diverging from pixel-wise metrics, our approach extracts elemental-level deep features that comprehensively encode cartographic structural integrity and topological relationships. Experimental validation demonstrates MFP's superior capability in evaluating cartographic semantic features, with classification-enhanced implementations outperforming conventional loss functions across diverse generative frameworks. When employed as optimization objectives, our metric achieves performance gains ranging from 2% to 50% across multiple benchmarks compared to traditional L1, L2, and SSIM baselines. This investigation concludes that explicit consideration of cartographic global attributes and spatial coherence substantially enhances generative model optimization, thereby significantly improving the geographical plausibility of synthesized maps.

Paper Structure

This paper contains 26 sections, 3 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Overview of methods for evaluating map features. Among them, [CLS] Token is an additional learnable global feature in ViT, and $K^L$ represents the keyword of the self-attention layer QKV of the L (i.e., last) layer Block, which is used to represent the deep semantic features of the map. Finally, the global feature $G'$ and semantic similarity $S'$ of the map are calculated.
  • Figure 2: Forward propagation process diagram of ViT. 1*: The overall layout is used with additional learnable classification tokens.2*: ViT's multi-head self-attention mechanism after multiple propagation layers, where the tensor Key focuses on rich semantic features in each image block.
  • Figure 3: Map datasets, MLMG-US datasets, some paired aerial images (left) and map images (right).
  • Figure 4: Qualitative results of each model on the Maps datasets using different methods. The asterisk (*) denotes model results with the additional incorporation of our map feature loss function. (a)input. (b)GT. (c)Pix2pix. (d)Pix2pix*. (e)Pix2pixHD. (f)Pix2pixHD*. (g)CycleGAN. (h)CycleGAN*. (i)ATME. (j)ATME*. (k)C2GM. (l)C2GM*.
  • Figure 5: Qualitative results of each model on the US-MLMG datasets using different methods. The asterisk (*) denotes model results with the additional incorporation of our map feature loss function. (a)Input. (b)GT. (c)Pix2pix. (d) Pix2pix*. (e)CycleGAN. (f) CycleGAN*. (g)SMAPGAN. (h) SMAPGAN*. (i)ATME. (j)ATME*. (k)C2GM. (l)C2GM*.
  • ...and 5 more figures