Next Point-of-interest (POI) Recommendation Model Based on Multi-modal Spatio-temporal Context Feature Embedding
Lingyu Zhang, Pengfei Xu, Rui Ban, Zhenchao Zhang, Songtao Liu, Yan Wang, Yunhai Wang
TL;DR
The paper tackles next-point-of-interest prediction for individual mobility by modeling both long-term habitual patterns and short-term contextual intentions using a semantic-embedding based dual-stream spatiotemporal attention framework. It introduces semantically enriched trajectory preprocessing, differentiated multimodal embeddings, and an environment-aware fusion mechanism to dynamically balance the two streams before performing attention-based matching to output a probability distribution over candidate locations. Empirical results on real-world Foursquare NYC and TKY datasets show state-of-the-art performance, with ablations confirming the benefits of activity duration modeling and dual-stream representations. The approach offers a practical, interpretable solution for high-timeliness and high-precision mobility predictions, facilitating smarter dispatching and pricing in intelligent mobility platforms, while noting opportunities for adaptive per-user partitioning in future work.
Abstract
Predicting the next pickup location of individual users is a fundamental problem in intelligent mobility systems, which requires modeling personalized travel behaviors under complex spatiotemporal contexts. Existing methods mainly learn sequential dependencies from raw trajectories, but often fail to capture high-level behavioral semantics and to effectively disentangle long-term habitual preferences from short-term contextual intentions. In this paper, we propose a semantic embedding based dual stream spatiotemporal attention model for next pickup location prediction. Raw trajectories are first transformed into semantically enriched activity sequences to encode users' stay behaviors and movement semantics. A dual stream architecture is then designed to explicitly decouple long-term historical patterns and short-term dynamic intentions, where each stream employs spatiotemporal attention mechanisms to model dependencies at different temporal scales. To integrate heterogeneous contextual information, a context aware dynamic fusion module adaptively balances the contributions of the two streams. Finally, an attention based matching strategy is used to predict the probability distribution over candidate pickup locations. Experiments on real world ride hailing datasets demonstrate that the proposed model consistently outperforms state of the art methods, validating the effectiveness of semantic trajectory abstraction and dual stream spatiotemporal attention for individualized mobility behavior modeling.
