Whom do Explanations Serve? A Systematic Literature Survey of User Characteristics in Explainable Recommender Systems Evaluation
Kathrin Wardatzky, Oana Inel, Luca Rossetto, Abraham Bernstein
TL;DR
This paper tackles the question of who explanations in recommender systems actually serve by conducting a systematic survey of 124 papers (2017–2022) that evaluated explanations in user studies. It analyzes participant descriptions (demographics, personality, experience) and how these characteristics affect explanation outcomes, finding substantial WEIRD bias and inconsistent reporting that threaten generalizability. The study identifies sparse and heterogeneous evidence for characteristic-driven effects, with some indications (e.g., social awareness) showing consistent influences on transparency and trust, but overall results are inconclusive. It provides concrete recommendations to improve recruitment, reporting, and reproducibility (including pre-registration and FAIR data sharing) to move toward inclusive, comparable, and reusable evaluations of explainable recommender systems.
Abstract
Adding explanations to recommender systems is said to have multiple benefits, such as increasing user trust or system transparency. Previous work from other application areas suggests that specific user characteristics impact the users' perception of the explanation. However, we rarely find this type of evaluation for recommender systems explanations. This paper addresses this gap by surveying 124 papers in which recommender systems explanations were evaluated in user studies. We analyzed their participant descriptions and study results where the impact of user characteristics on the explanation effects was measured. Our findings suggest that the results from the surveyed studies predominantly cover specific users who do not necessarily represent the users of recommender systems in the evaluation domain. This may seriously hamper the generalizability of any insights we may gain from current studies on explanations in recommender systems. We further find inconsistencies in the data reporting, which impacts the reproducibility of the reported results. Hence, we recommend actions to move toward a more inclusive and reproducible evaluation.
