Table of Contents
Fetching ...

SemaSK: Answering Semantics-aware Spatial Keyword Queries with Large Language Models

Zesong Zhang, Jianzhong Qi, Xin Cao, Christian S. Jensen

TL;DR

The paper tackles the gap in semantic relevance for geo-textual queries by introducing SemaSK, a semantics-aware system for spatial keyword querying. It combines an offline embedding-based filtering stage with a retrieval-augmented generation (RAG) refinement step that uses a large language model to re-rank candidates within the query region $q.r$ according to the textual constraint $q.T$. The approach, demonstrated on Yelp-derived geo-textual data, shows substantial improvements in $F1@k$ over traditional baselines (e.g., TF-IDF, LDA), with practical runtime characteristics (0.04s for filtering and 2–3s for refinement per query). The work provides a concrete pipeline, detailed dataset preparation steps, and open-source code, highlighting the potential of LLMs for semantics-aware spatial keyword processing in real-world geo-textual search applications.

Abstract

Geo-textual objects, i.e., objects with both spatial and textual attributes, such as points-of-interest or web documents with location tags, are prevalent and fuel a range of location-based services. Existing spatial keyword querying methods that target such data have focused primarily on efficiency and often involve proposals for index structures for efficient query processing. In these studies, due to challenges in measuring the semantic relevance of textual data, query constraints on the textual attributes are largely treated as a keyword matching process, ignoring richer query and data semantics. To advance the semantic aspects, we propose a system named SemaSK that exploits the semantic capabilities of large language models to retrieve geo-textual objects that are more semantically relevant to a query. Experimental results on a real dataset offer evidence of the effectiveness of the system, and a system demonstration is presented in this paper.

SemaSK: Answering Semantics-aware Spatial Keyword Queries with Large Language Models

TL;DR

The paper tackles the gap in semantic relevance for geo-textual queries by introducing SemaSK, a semantics-aware system for spatial keyword querying. It combines an offline embedding-based filtering stage with a retrieval-augmented generation (RAG) refinement step that uses a large language model to re-rank candidates within the query region according to the textual constraint . The approach, demonstrated on Yelp-derived geo-textual data, shows substantial improvements in over traditional baselines (e.g., TF-IDF, LDA), with practical runtime characteristics (0.04s for filtering and 2–3s for refinement per query). The work provides a concrete pipeline, detailed dataset preparation steps, and open-source code, highlighting the potential of LLMs for semantics-aware spatial keyword processing in real-world geo-textual search applications.

Abstract

Geo-textual objects, i.e., objects with both spatial and textual attributes, such as points-of-interest or web documents with location tags, are prevalent and fuel a range of location-based services. Existing spatial keyword querying methods that target such data have focused primarily on efficiency and often involve proposals for index structures for efficient query processing. In these studies, due to challenges in measuring the semantic relevance of textual data, query constraints on the textual attributes are largely treated as a keyword matching process, ignoring richer query and data semantics. To advance the semantic aspects, we propose a system named SemaSK that exploits the semantic capabilities of large language models to retrieve geo-textual objects that are more semantically relevant to a query. Experimental results on a real dataset offer evidence of the effectiveness of the system, and a system demonstration is presented in this paper.

Paper Structure

This paper contains 8 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Querying "café" in Melbourne CBD.
  • Figure 2: Overview of the SemaSK System.
  • Figure 3: A screenshot of the SemaSK system.