Understanding the Influence of Data Characteristics on the Performance of Point-of-Interest Recommendation Algorithms
Linus W. Dietz, Pablo Sánchez, Alejandro Bellogín
TL;DR
The paper tackles how data characteristics shape POI recommender performance, proposing a domain-tailored explanatory framework that links data properties to outcomes in accuracy, novelty, and item exposure.It extends prior frameworks with 32 POI-specific explanatory variables, introduces a domain-driven subsampling method, and evaluates 7 competitive algorithms across 144 NYC-based subsamples using a regression analysis.Key findings show that data structure (e.g., Shape, Density) and check-in distribution (Gini_U, StPB, KuPB) strongly affect performance, while spatio-temporal factors like Radius of Gyration and Duration Active also play critical roles.The study provides practical guidance for algorithm selection in e-tourism, emphasizes beyond-accuracy metrics, and suggests avenues for integrating data-characteristic awareness into automated recommender systems.
Abstract
Point-of-interest (POI) recommendations are essential for travelers and the e-tourism business. They assist in decision-making regarding what venues to visit and where to dine and stay. While it is known that traditional recommendation algorithms' performance depends on data characteristics like sparsity, popularity bias, and preference distributions, the impact of these data characteristics has not been systematically studied in the POI recommendation domain. To fill this gap, we extend a previously proposed explanatory framework by introducing new explanatory variables specifically relevant to POI recommendation. At its core, the framework relies on having subsamples with different data characteristics to compute a regression model, which reveals the dependencies between data characteristics and performance metrics of recommendation models. To obtain these subsamples, we subdivide a POI recommendation data set on New York City and measure the effect of these characteristics on different classical POI recommendation algorithms in terms of accuracy, novelty, and item exposure. Our findings confirm the crucial role of key data features like density, popularity bias, and the distribution of check-ins in POI recommendation. Additionally, we identify the significance of novel factors, such as user mobility and the duration of user activity. In summary, our work presents a generic method to quantify the influence of data characteristics on recommendation performance. The results not only show why certain POI recommendation algorithms excel in specific recommendation problems derived from a LBSN check-in data set in New York City, but also offer practical insights into which data characteristics need to be addressed to achieve better recommendation performance.
