Table of Contents
Fetching ...

Whom do Explanations Serve? A Systematic Literature Survey of User Characteristics in Explainable Recommender Systems Evaluation

Kathrin Wardatzky, Oana Inel, Luca Rossetto, Abraham Bernstein

TL;DR

This paper tackles the question of who explanations in recommender systems actually serve by conducting a systematic survey of 124 papers (2017–2022) that evaluated explanations in user studies. It analyzes participant descriptions (demographics, personality, experience) and how these characteristics affect explanation outcomes, finding substantial WEIRD bias and inconsistent reporting that threaten generalizability. The study identifies sparse and heterogeneous evidence for characteristic-driven effects, with some indications (e.g., social awareness) showing consistent influences on transparency and trust, but overall results are inconclusive. It provides concrete recommendations to improve recruitment, reporting, and reproducibility (including pre-registration and FAIR data sharing) to move toward inclusive, comparable, and reusable evaluations of explainable recommender systems.

Abstract

Adding explanations to recommender systems is said to have multiple benefits, such as increasing user trust or system transparency. Previous work from other application areas suggests that specific user characteristics impact the users' perception of the explanation. However, we rarely find this type of evaluation for recommender systems explanations. This paper addresses this gap by surveying 124 papers in which recommender systems explanations were evaluated in user studies. We analyzed their participant descriptions and study results where the impact of user characteristics on the explanation effects was measured. Our findings suggest that the results from the surveyed studies predominantly cover specific users who do not necessarily represent the users of recommender systems in the evaluation domain. This may seriously hamper the generalizability of any insights we may gain from current studies on explanations in recommender systems. We further find inconsistencies in the data reporting, which impacts the reproducibility of the reported results. Hence, we recommend actions to move toward a more inclusive and reproducible evaluation.

Whom do Explanations Serve? A Systematic Literature Survey of User Characteristics in Explainable Recommender Systems Evaluation

TL;DR

This paper tackles the question of who explanations in recommender systems actually serve by conducting a systematic survey of 124 papers (2017–2022) that evaluated explanations in user studies. It analyzes participant descriptions (demographics, personality, experience) and how these characteristics affect explanation outcomes, finding substantial WEIRD bias and inconsistent reporting that threaten generalizability. The study identifies sparse and heterogeneous evidence for characteristic-driven effects, with some indications (e.g., social awareness) showing consistent influences on transparency and trust, but overall results are inconclusive. It provides concrete recommendations to improve recruitment, reporting, and reproducibility (including pre-registration and FAIR data sharing) to move toward inclusive, comparable, and reusable evaluations of explainable recommender systems.

Abstract

Adding explanations to recommender systems is said to have multiple benefits, such as increasing user trust or system transparency. Previous work from other application areas suggests that specific user characteristics impact the users' perception of the explanation. However, we rarely find this type of evaluation for recommender systems explanations. This paper addresses this gap by surveying 124 papers in which recommender systems explanations were evaluated in user studies. We analyzed their participant descriptions and study results where the impact of user characteristics on the explanation effects was measured. Our findings suggest that the results from the surveyed studies predominantly cover specific users who do not necessarily represent the users of recommender systems in the evaluation domain. This may seriously hamper the generalizability of any insights we may gain from current studies on explanations in recommender systems. We further find inconsistencies in the data reporting, which impacts the reproducibility of the reported results. Hence, we recommend actions to move toward a more inclusive and reproducible evaluation.

Paper Structure

This paper contains 47 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Different ways of reporting the participants' age. Each visualization corresponds to how the information was reported in the papers, where one row represents one study sorted by lower-bound age. Black bars denote age ranges, and combined black and gray bars denote total and 'dominant' ranges. Blue curves denote centered distributions with two standard deviations around a mean. Yellow curves denote distributions with explicit lower- and upper bounds. Green lines represent explicit age histogram buckets, and diamonds $\blacklozenge$ represent mean values.
  • Figure 2: Number of studies involving participants from different countries