Unifying points of interest taxonomies: mapping OpenStreetMap tags to the Foursquare category system
Lilou Soulas, Lorenzo Lucchini, Maurizio Napolitano, Sebastiano Bontorin, Simone Centellegher, Bruno Lepri, Riccardo Gallotti, Eleonora Andreotti
TL;DR
This work tackles the interoperability challenge posed by heterogeneous POI taxonomies across OpenStreetMap (OSM) and Foursquare (FS). It introduces a three‑part framework: a manually curated benchmark mapping OSM tags to FS categories, an embedding‑based semantic alignment stage, and an LLM‑based refinement stage to robustly select the best matches, all supported by a scalable update pipeline. The authors release cleaned taxonomies, the oracle benchmark, enriched FS descriptions, embedding results, and end‑to‑end notebooks to enable reproducible evaluation and long‑term maintenance. The results show that embedding retrieval combined with LLM refinement substantially improves alignment accuracy, achieving around 85% coverage at the top FS level and over 72% at deeper levels, with meaningful gains over baseline methods. Overall, the work delivers an openly available benchmark and toolchain that enable reproducible, scalable unification of heterogeneous POI taxonomies for urban analytics and smart city applications.
Abstract
The heterogeneity of Point of Interest (POI) taxonomies is a persistent challenge for the integration of urban datasets and the development of location-based services. OpenStreetMap (OSM) adopts a flexible, community-driven tagging system, while Foursquare (FS) relies on a curated hierarchical structure. Here we present an openly available benchmark and mapping framework that aligns OSM tags with the FS taxonomy. This resource integrates the richness of community-driven OSM data with the hierarchical structure of FS, enabling reproducible and interoperable urban analytics. The dataset is complemented by an evaluation of embedding and LLM-based alignment strategies and a pipeline that supports scalable updates as OSM evolves. Together, these elements provide both a robust reference resource and a practical tool for the community. Our approach is structured around three components: the construction of a manually curated benchmark as a gold standard, the evaluation of pretrained text embedding models for semantic alignment between OSM tags and FS categories, and an LLM-based refinement stage that enhances robustness and adaptability. The proposed methodology provides a scalable and reproducible solution for taxonomy unification, with direct applications to urban analytics, mobility studies, and smart city services.
