MAPWise: Evaluating Vision-Language Models for Advanced Map Queries

Srija Mukhopadhyay; Abhishek Rajgaria; Prerana Khatiwada; Vivek Gupta; Dan Roth

MAPWise: Evaluating Vision-Language Models for Advanced Map Queries

Srija Mukhopadhyay, Abhishek Rajgaria, Prerana Khatiwada, Vivek Gupta, Dan Roth

TL;DR

This study investigates the efficacy of VLMs in answering questions based on choropleth maps, which are widely used for data analysis and representation, and introduces a novel map-based question-answering benchmark, consisting of maps from three geographical regions.

Abstract

Vision-language models (VLMs) excel at tasks requiring joint understanding of visual and linguistic information. A particularly promising yet under-explored application for these models lies in answering questions based on various kinds of maps. This study investigates the efficacy of VLMs in answering questions based on choropleth maps, which are widely used for data analysis and representation. To facilitate and encourage research in this area, we introduce a novel map-based question-answering benchmark, consisting of maps from three geographical regions (United States, India, China), each containing 1000 questions. Our benchmark incorporates 43 diverse question templates, requiring nuanced understanding of relative spatial relationships, intricate map features, and complex reasoning. It also includes maps with discrete and continuous values, encompassing variations in color-mapping, category ordering, and stylistic patterns, enabling comprehensive analysis. We evaluate the performance of multiple VLMs on this benchmark, highlighting gaps in their abilities and providing insights for improving such models.

MAPWise: Evaluating Vision-Language Models for Advanced Map Queries

TL;DR

Abstract

Paper Structure (39 sections, 6 figures, 46 tables, 1 algorithm)

This paper contains 39 sections, 6 figures, 46 tables, 1 algorithm.

Introduction
The MAPWise Dataset
Dataset Creation
Data Sources.
Map Variations.
Question Generation.
Dataset Validation.
Experimental Evaluation
Baseline Models
Closed-Source MLLMs.
Open-Source VLMs.
Prompting Strategies
Evaluation Details
Note for Open Source VLMs.
Results and Analysis
...and 24 more sections

Figures (6)

Figure 1: A question-map pair from our MAPWise dataset and the corresponding gold truth answer.
Figure 2: Examples of map with annotations, without annotations for the same underlying data. Additionally, hatched maps were created to better assess model understanding and performance.
Figure 3: Examples of map with Imaginary and Shuffled names and Jumbled Values for the same underlying data.
Figure 4: Zero shot COT prompt representation
Figure 5: Example of a Few shot COT with visual to textual representation.
...and 1 more figures

MAPWise: Evaluating Vision-Language Models for Advanced Map Queries

TL;DR

Abstract

MAPWise: Evaluating Vision-Language Models for Advanced Map Queries

Authors

TL;DR

Abstract

Table of Contents

Figures (6)