Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?

Ilya Ilyankou; Aldo Lipani; Stefano Cavazzi; Xiaowei Gao; James Haworth

Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?

Ilya Ilyankou, Aldo Lipani, Stefano Cavazzi, Xiaowei Gao, James Haworth

TL;DR

The paper addresses whether sentence transformers trained on general question-answering data can learn quasi-geospatial concepts from hiking-route descriptions. It builds a large, template-generated text corpus from OS Maps routes and uses embedding-based cosine similarity to match 20 hiking-related queries to route descriptions. Results show limited but present zero-shot signals (e.g., beginner or coastal versus urban patterns), with substantial variation across model architectures, indicating that pre-training and architecture shape geospatial understanding. The work highlights potential for geospatial text understanding in routing tasks and calls for more systematic, architecture-aware evaluation and richer textual representations of geospatial objects.

Abstract

Sentence transformers are language models designed to perform semantic search. This study investigates the capacity of sentence transformers, fine-tuned on general question-answering datasets for asymmetric semantic search, to associate descriptions of human-generated routes across Great Britain with queries often used to describe hiking experiences. We find that sentence transformers have some zero-shot capabilities to understand quasi-geospatial concepts, such as route types and difficulty, suggesting their potential utility for routing recommendation systems.

Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?

TL;DR

Abstract

Paper Structure (10 sections, 3 figures, 3 tables)

This paper contains 10 sections, 3 figures, 3 tables.

Introduction
Methodology
Data
Generating descriptions
Matching queries with descriptions
Visualising results
Results
Conclusion
Tables
Full results

Figures (3)

Figure 1: Cumulative mean length, grade, and elevation gain for what should be easier routes
Figure 2: Cumulative mean length, grade, and elevation gain for what should be harder routes
Figure 3: Three models, even though fine-tuned on the same dataset, disagree on which walks can be completed in under 1 hour

Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?

TL;DR

Abstract

Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?

Authors

TL;DR

Abstract

Table of Contents

Figures (3)