Table of Contents
Fetching ...

Can Offline Metrics Measure Explanation Goals? A Comparative Survey Analysis of Offline Explanation Metrics in Recommender Systems

André Levi Zanon, Marcelo Garcia Manzato, Leonardo Rocha

TL;DR

The paper tackles the challenge of evaluating explanation goals in recommender systems using offline metrics. It introduces path-based explanations that connect interacted and recommended items via shared attributes and systematically studies how the choice of attributes and interacted items affects perception, across three KG-agnostic algorithms and six RSs. Through a two-stage approach—offline path-metric evaluation on MovieLens and LastFM KG datasets, followed by online user studies—the authors reveal partial alignment: attribute diversity strongly influences engagement, while both popularity and diversity impact transparency and trust. The findings highlight a gap between current offline metrics and true user perception, and propose guidelines and directions for developing offline metrics that better reflect explanation goals and user understanding.

Abstract

In Recommender System (RS), explanations help users understand why items are recommended and can enhance a system's transparency, persuasiveness, engagement, and trust, which are known as explanation goals. However, evaluating the effectiveness of explanation algorithms offline remains challenging because explanation goals are inherently subjective. We initially conducted a rapid literature review, which revealed that algorithms are often assessed using anecdotal evidence (offering convincing examples) or using metrics that do not align with human perception. From these results, we investigated whether the selection of item attributes and interacted items affects explanation goals in explanations that generate a path connecting interacted and recommended items based on shared attributes (such as genres). We used metrics that measure the diversity and popularity of attributes and the recency of item interactions to evaluate explanations from three state-of-the-art agnostic algorithms across six recommendation systems. We then performed an online user study to compare user perceptions of explanation goals and offline metrics. Our findings indicate that engagement is sensitive to users' perceptions of diversity in explanations, whereas transparency, trust, and persuasiveness are influenced by perceptions of both popularity and diversity. However, offline metrics require refinement to more closely align with explanation goals and user understanding.

Can Offline Metrics Measure Explanation Goals? A Comparative Survey Analysis of Offline Explanation Metrics in Recommender Systems

TL;DR

The paper tackles the challenge of evaluating explanation goals in recommender systems using offline metrics. It introduces path-based explanations that connect interacted and recommended items via shared attributes and systematically studies how the choice of attributes and interacted items affects perception, across three KG-agnostic algorithms and six RSs. Through a two-stage approach—offline path-metric evaluation on MovieLens and LastFM KG datasets, followed by online user studies—the authors reveal partial alignment: attribute diversity strongly influences engagement, while both popularity and diversity impact transparency and trust. The findings highlight a gap between current offline metrics and true user perception, and propose guidelines and directions for developing offline metrics that better reflect explanation goals and user understanding.

Abstract

In Recommender System (RS), explanations help users understand why items are recommended and can enhance a system's transparency, persuasiveness, engagement, and trust, which are known as explanation goals. However, evaluating the effectiveness of explanation algorithms offline remains challenging because explanation goals are inherently subjective. We initially conducted a rapid literature review, which revealed that algorithms are often assessed using anecdotal evidence (offering convincing examples) or using metrics that do not align with human perception. From these results, we investigated whether the selection of item attributes and interacted items affects explanation goals in explanations that generate a path connecting interacted and recommended items based on shared attributes (such as genres). We used metrics that measure the diversity and popularity of attributes and the recency of item interactions to evaluate explanations from three state-of-the-art agnostic algorithms across six recommendation systems. We then performed an online user study to compare user perceptions of explanation goals and offline metrics. Our findings indicate that engagement is sensitive to users' perceptions of diversity in explanations, whereas transparency, trust, and persuasiveness are influenced by perceptions of both popularity and diversity. However, offline metrics require refinement to more closely align with explanation goals and user understanding.
Paper Structure (32 sections, 6 equations, 20 figures, 13 tables)

This paper contains 32 sections, 6 equations, 20 figures, 13 tables.

Figures (20)

  • Figure 1: Example of different item's attributes for a single interacted item. Attributes are represented in orange and, in blue, the relation between the attribute and the item and in green are items.
  • Figure 2: Workflow of the conducted rapid literature review
  • Figure 3: Distribution of the papers found by our rapid literature review by year of publication.
  • Figure 4: Distribution of papers that did offline evaluation of RSs considering the metrics used (a), the number of users used to evaluate generated explanations (b) and chosen method (c) in relation to explanation styles.
  • Figure 5: Methodology to validate the offline metrics
  • ...and 15 more figures