Measuring the Predictability of Recommender Systems using Structural Complexity Metrics
Alfonso Valderrama, Andrés Abeliuk
TL;DR
The paper tackles the problem of quantifying the inherent predictability of recommender systems by treating the user-item rating matrix $M$ as a structural object and measuring its complexity under perturbations. It introduces two data-driven metrics, Analytical Structural Consistency (ASC) and Empirical Structural Consistency (ESC), derived from SVD-based perturbations and matrix factorization, respectively, and evaluates them against the best-performing CF algorithms using RMSE on real and synthetic data. Real-data results show a strong link between the metrics and predictive performance, with ASC and ESC achieving high correlations (e.g., $r=0.968$, $r=0.924$ for Pearson) and ESC demonstrating robustness on synthetic data where ASC may fail. The work suggests that these metrics can guide algorithm selection and monitor system evolution, while noting computational costs and pointing to future work on scalability and broader validation.
Abstract
Recommender systems (RS) are central to the filtering and curation of online content. These algorithms predict user ratings for unseen items based on past preferences. Despite their importance, the innate predictability of RS has received limited attention. This study introduces data-driven metrics to measure the predictability of RS based on the structural complexity of the user-item rating matrix. A low predictability score indicates complex and unpredictable user-item interactions, while a high predictability score reveals less complex patterns with predictive potential. We propose two strategies that use singular value decomposition (SVD) and matrix factorization (MF) to measure structural complexity. By perturbing the data and evaluating the prediction of the perturbed version, we explore the structural consistency indicated by the SVD singular vectors. The assumption is that a random perturbation of highly structured data does not change its structure. Empirical results show a high correlation between our metrics and the accuracy of the best-performing prediction algorithms on real data sets.
