Semantic Trajectory Data Mining with LLM-Informed POI Classification
Yifan Liu, Chenchen Kuai, Haoxuan Ma, Xishun Liao, Brian Yueshuai He, Jiaqi Ma
TL;DR
This paper tackles the lack of semantic context in GPS-based trajectory data by introducing a two-stage framework that first uses large language models to classify POIs and then applies a Bayesian approach to infer stay-point activities. The method is designed to work with incomplete POI features and without additional training, demonstrating robustness across diverse datasets (LA County and Egypt) using OpenStreetMap data. Key contributions include the first application of LLMs to POI classification, a probabilistic activity inference mechanism that combines nearby POIs and temporal priors, and strong empirical results (POI classification accuracy up to 93.4% and activity inference accuracy around 91–92%). The work advances semantic trajectory mining with practical implications for transportation systems, local searches, and mobility research, while addressing data quality challenges and privacy considerations.
Abstract
Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly enhance the quality of trajectory mining. However, integrating these insights is challenging, as many POIs have incomplete feature information, and current learning-based POI algorithms require the integrity of datasets to do the classification. In this paper, we introduce a novel pipeline for human travel trajectory mining. Our approach first leverages the strong inferential and comprehension capabilities of large language models (LLMs) to annotate POI with activity types and then uses a Bayesian-based algorithm to infer activity for each stay point in a trajectory. In our evaluation using the OpenStreetMap (OSM) POI dataset, our approach achieves a 93.4% accuracy and a 96.1% F-1 score in POI classification, and a 91.7% accuracy with a 92.3% F-1 score in activity inference.
