Table of Contents
Fetching ...

Understanding the Influence of Data Characteristics on the Performance of Point-of-Interest Recommendation Algorithms

Linus W. Dietz, Pablo Sánchez, Alejandro Bellogín

TL;DR

The paper tackles how data characteristics shape POI recommender performance, proposing a domain-tailored explanatory framework that links data properties to outcomes in accuracy, novelty, and item exposure.It extends prior frameworks with 32 POI-specific explanatory variables, introduces a domain-driven subsampling method, and evaluates 7 competitive algorithms across 144 NYC-based subsamples using a regression analysis.Key findings show that data structure (e.g., Shape, Density) and check-in distribution (Gini_U, StPB, KuPB) strongly affect performance, while spatio-temporal factors like Radius of Gyration and Duration Active also play critical roles.The study provides practical guidance for algorithm selection in e-tourism, emphasizes beyond-accuracy metrics, and suggests avenues for integrating data-characteristic awareness into automated recommender systems.

Abstract

Point-of-interest (POI) recommendations are essential for travelers and the e-tourism business. They assist in decision-making regarding what venues to visit and where to dine and stay. While it is known that traditional recommendation algorithms' performance depends on data characteristics like sparsity, popularity bias, and preference distributions, the impact of these data characteristics has not been systematically studied in the POI recommendation domain. To fill this gap, we extend a previously proposed explanatory framework by introducing new explanatory variables specifically relevant to POI recommendation. At its core, the framework relies on having subsamples with different data characteristics to compute a regression model, which reveals the dependencies between data characteristics and performance metrics of recommendation models. To obtain these subsamples, we subdivide a POI recommendation data set on New York City and measure the effect of these characteristics on different classical POI recommendation algorithms in terms of accuracy, novelty, and item exposure. Our findings confirm the crucial role of key data features like density, popularity bias, and the distribution of check-ins in POI recommendation. Additionally, we identify the significance of novel factors, such as user mobility and the duration of user activity. In summary, our work presents a generic method to quantify the influence of data characteristics on recommendation performance. The results not only show why certain POI recommendation algorithms excel in specific recommendation problems derived from a LBSN check-in data set in New York City, but also offer practical insights into which data characteristics need to be addressed to achieve better recommendation performance.

Understanding the Influence of Data Characteristics on the Performance of Point-of-Interest Recommendation Algorithms

TL;DR

The paper tackles how data characteristics shape POI recommender performance, proposing a domain-tailored explanatory framework that links data properties to outcomes in accuracy, novelty, and item exposure.It extends prior frameworks with 32 POI-specific explanatory variables, introduces a domain-driven subsampling method, and evaluates 7 competitive algorithms across 144 NYC-based subsamples using a regression analysis.Key findings show that data structure (e.g., Shape, Density) and check-in distribution (Gini_U, StPB, KuPB) strongly affect performance, while spatio-temporal factors like Radius of Gyration and Duration Active also play critical roles.The study provides practical guidance for algorithm selection in e-tourism, emphasizes beyond-accuracy metrics, and suggests avenues for integrating data-characteristic awareness into automated recommender systems.

Abstract

Point-of-interest (POI) recommendations are essential for travelers and the e-tourism business. They assist in decision-making regarding what venues to visit and where to dine and stay. While it is known that traditional recommendation algorithms' performance depends on data characteristics like sparsity, popularity bias, and preference distributions, the impact of these data characteristics has not been systematically studied in the POI recommendation domain. To fill this gap, we extend a previously proposed explanatory framework by introducing new explanatory variables specifically relevant to POI recommendation. At its core, the framework relies on having subsamples with different data characteristics to compute a regression model, which reveals the dependencies between data characteristics and performance metrics of recommendation models. To obtain these subsamples, we subdivide a POI recommendation data set on New York City and measure the effect of these characteristics on different classical POI recommendation algorithms in terms of accuracy, novelty, and item exposure. Our findings confirm the crucial role of key data features like density, popularity bias, and the distribution of check-ins in POI recommendation. Additionally, we identify the significance of novel factors, such as user mobility and the duration of user activity. In summary, our work presents a generic method to quantify the influence of data characteristics on recommendation performance. The results not only show why certain POI recommendation algorithms excel in specific recommendation problems derived from a LBSN check-in data set in New York City, but also offer practical insights into which data characteristics need to be addressed to achieve better recommendation performance.
Paper Structure (52 sections, 15 equations, 7 figures, 13 tables)

This paper contains 52 sections, 15 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: Heat map of the visited venues in New York with different parameters for the $k$-core and the origin of the users (better viewed in color).
  • Figure 2: Diagram representing the methodology followed in the paper. Each number corresponds to a step in the process: Initially, we clean the original check-in data set (1) to obtain a recommendation data set, which we subsequently subdivide into the 144 subsamples (2). Each subsample is further split into training and test sets (3), where a recommendation model is trained on each subsample individually (4) and the explanatory variables are computed based on the training sets (5). We determine the best hyperparameters of each recommender in each subsample using the test set (6), and record the metrics of the best performing recommendation configuration (7). Finally, we perform the regression analysis towards the performance metrics of each recommendation algorithm using the explanatory variables (8).
  • Figure 3: The recommendation outcomes using the following metrics: a) nDCG@5, b) EPC@5, and c) Item Exposure@5. The boxplot indicates the 25%, the median, and the 75% quantiles. Overlayed is a violin plot emphasizing the density of values and the overall range of the outcomes and the mean value with an x. The dashed line in the nDCG plot of \ref{['fig:recommendation_outcome']}a) indicates the mean value of the Popularity algorithm.
  • Figure 4: Coefficient plot for nDCG@5. StRG has a negative influence on nDCG@5, Shape, KuPB, and MedDA are mostly neutral, whereas the other EVs have a positive impact.
  • Figure 5: Coefficient plot for novelty measured using EPC@5. We observe an inverse relationship compared to nDCG @5 with Density, StPB, and KuDA having a negative impact and Shape, Gini$_U$, and MedDA having a positive impact on the EPC metric.
  • ...and 2 more figures

Theorems & Definitions (10)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Definition 8
  • Definition 9
  • Definition 10