Towards Effective Next POI Prediction: Spatial and Semantic Augmentation with Remote Sensing Data
Nan Jiang, Haitao Yuan, Jianing Si, Minxiao Chen, Shangguang Wang
TL;DR
This work tackles next POI prediction by jointly modeling spatial and semantic intents through a two-step framework (TSPN-RA) that augments urban region representations with remote sensing imagery and captures historical knowledge via a region quad-tree-based QR-P graph. It introduces tile and POI embeddings, a historical graph knowledge encoder, and attention-based fusion to produce tile-level and POI-level predictions, predicting spatial zones first and then specific POIs within those zones. Comprehensive experiments on four real-world datasets show consistent improvements over strong baselines in both accuracy and efficiency, with ablation studies highlighting the critical roles of the QR-P graph, remote sensing augmentation, and the two-step strategy. The approach offers practical benefits for real-time, geo-aware recommendations in urban settings, enabling better environmental awareness, scalable spatial partitioning, and richer contextual understanding of user mobility.
Abstract
The next point-of-interest (POI) prediction is a significant task in location-based services, yet its complexity arises from the consolidation of spatial and semantic intent. This fusion is subject to the influences of historical preferences, prevailing location, and environmental factors, thereby posing significant challenges. In addition, the uneven POI distribution further complicates the next POI prediction procedure. To address these challenges, we enrich input features and propose an effective deep-learning method within a two-step prediction framework. Our method first incorporates remote sensing data, capturing pivotal environmental context to enhance input features regarding both location and semantics. Subsequently, we employ a region quad-tree structure to integrate urban remote sensing, road network, and POI distribution spaces, aiming to devise a more coherent graph representation method for urban spatial. Leveraging this method, we construct the QR-P graph for the user's historical trajectories to encapsulate historical travel knowledge, thereby augmenting input features with comprehensive spatial and semantic insights. We devise distinct embedding modules to encode these features and employ an attention mechanism to fuse diverse encodings. In the two-step prediction procedure, we initially identify potential spatial zones by predicting user-preferred tiles, followed by pinpointing specific POIs of a designated type within the projected tiles. Empirical findings from four real-world location-based social network datasets underscore the remarkable superiority of our proposed approach over competitive baseline methods.
