Table of Contents
Fetching ...

PlaceFM: A Training-free Geospatial Foundation Model of Places using Large-Scale Point of Interest Data

Mohammad Hashemi, Hossein Amiri, Andreas Zufle

TL;DR

PlaceFM presents a training-free geospatial foundation model that learns multi-granular region embeddings and automatic place identification from a large POI graph. It combines SD-CEM-based feature encoding, lightweight graph propagation, and clustering (bisecting k-means) to condense POI information into place and region embeddings, followed by a region-level aggregation. Empirical results on ZIP-code population density and housing prices show PlaceFM often outperforms state-of-the-art baselines while delivering substantial speedups in embedding generation, and the approach provides robust transferability across downstream models. The work offers a scalable, interpretable framework for flexible geospatial analysis and sets the stage for incorporating additional modalities in future work.

Abstract

With the rapid growth and continual updates of geospatial data from diverse sources, geospatial foundation model pre-training for urban representation learning has emerged as a key research direction for advancing data-driven urban planning. Spatial structure is fundamental to effective geospatial intelligence systems; however, existing foundation models often lack the flexibility to reason about places, context-rich regions spanning multiple spatial granularities that may consist of many spatially and semantically related points of interest. To address this gap, we propose PlaceFM, a geospatial foundation model that captures place representations through a training-free, clustering-based approach. PlaceFM summarizes the entire point of interest graph constructed from U.S. Foursquare data, producing general-purpose region embeddings while automatically identifying places of interest. These embeddings can be directly integrated into geolocation data pipelines to support a variety of urban downstream tasks. Without the need for costly pre-training, PlaceFM provides a scalable and efficient solution for multi-granular geospatial analysis. Extensive experiments on two real-world prediction tasks, ZIP code-level population density and housing prices, demonstrate that PlaceFM not only outperforms most state-of-the-art graph-based geospatial foundation models but also achieves up to a 100x speedup in generating region-level representations on large-scale POI graphs. The implementation is available at https://github.com/mohammadhashemii/PlaceFM.

PlaceFM: A Training-free Geospatial Foundation Model of Places using Large-Scale Point of Interest Data

TL;DR

PlaceFM presents a training-free geospatial foundation model that learns multi-granular region embeddings and automatic place identification from a large POI graph. It combines SD-CEM-based feature encoding, lightweight graph propagation, and clustering (bisecting k-means) to condense POI information into place and region embeddings, followed by a region-level aggregation. Empirical results on ZIP-code population density and housing prices show PlaceFM often outperforms state-of-the-art baselines while delivering substantial speedups in embedding generation, and the approach provides robust transferability across downstream models. The work offers a scalable, interpretable framework for flexible geospatial analysis and sets the stage for incorporating additional modalities in future work.

Abstract

With the rapid growth and continual updates of geospatial data from diverse sources, geospatial foundation model pre-training for urban representation learning has emerged as a key research direction for advancing data-driven urban planning. Spatial structure is fundamental to effective geospatial intelligence systems; however, existing foundation models often lack the flexibility to reason about places, context-rich regions spanning multiple spatial granularities that may consist of many spatially and semantically related points of interest. To address this gap, we propose PlaceFM, a geospatial foundation model that captures place representations through a training-free, clustering-based approach. PlaceFM summarizes the entire point of interest graph constructed from U.S. Foursquare data, producing general-purpose region embeddings while automatically identifying places of interest. These embeddings can be directly integrated into geolocation data pipelines to support a variety of urban downstream tasks. Without the need for costly pre-training, PlaceFM provides a scalable and efficient solution for multi-granular geospatial analysis. Extensive experiments on two real-world prediction tasks, ZIP code-level population density and housing prices, demonstrate that PlaceFM not only outperforms most state-of-the-art graph-based geospatial foundation models but also achieves up to a 100x speedup in generating region-level representations on large-scale POI graphs. The implementation is available at https://github.com/mohammadhashemii/PlaceFM.

Paper Structure

This paper contains 28 sections, 12 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The spatial distribution of POIs based on a uniform random sample of one million entries from the FSQ-19M dataset, covering the 48 contiguous U.S. states.
  • Figure 2: Pipeline of the proposed geospatial foundation model, PlaceFM. First, it builds POI-level graphs for each state, followed by category feature encoding and feature propagation to obtain neighborhood-aware POI embeddings. Places are then identified via training-free clustering at a chosen granularity, and an aggregator function produces the final region-level embedding.
  • Figure 3: Graph structures of a sample of POIs within ZIP code 10031 in Manhattan, NY: (a) 4-NN graph (unweighted), (b) 8-NN graph (unweighted), and (c) region-adaptive Delaunay triangulation (weighted). Edge opacity reflects weight, with darker edges indicating stronger connections.
  • Figure 4: Spatial distribution of absolute housing price estimation errors. The top figure shows ZIP-code regions in Vermont (VT), and the bottom figure shows those in Georgia (GA).
  • Figure 5: Efficiency comparison of region embedding generation.
  • ...and 1 more figures

Theorems & Definitions (1)

  • definition 1: Place