Table of Contents
Fetching ...

Designing an Interpretable Interface for Contextual Bandits

Andrew Maher, Matia Gobbo, Lancelot Lachartre, Subash Prabanantham, Rowan Swiers, Puli Liyanagama

TL;DR

This work tackles the interpretability gap of contextual bandits for domain-expert operators by introducing an interpretable production dashboard built around a value gain metric derived from off-policy evaluation. The value gain is defined as $g(\tau) = v^{\pi} - v^{\overline{\pi}}$, with $v^{\pi} = \mathbb{E}_{\pi}[r]$ and $v^{\overline{\pi}}$ estimated via off-policy methods such as inverse propensity scoring $v^{\overline{\pi}} = \frac{1}{n} \sum_{i=1}^{n} \frac{\overline{\pi}(a|x_i)}{\pi(a|x_i)} r_i$. The interface provides top-level, arm-level, and context-level visualizations (including a radar chart and context-contribution bars) to support ablation-style reasoning about component value. A qualitative user study with three marketing professionals demonstrates that, when paired with accessible explanations, technical metrics can be understood and used to guide production decisions, yielding practical design principles for future interpretable dashboards in bandit settings. The work argues for integrating rigorous, technically grounded measures with clear presentation to empower non-experts in managing complex ML systems in production.

Abstract

Contextual bandits have become an increasingly popular solution for personalized recommender systems. Despite their growing use, the interpretability of these systems remains a significant challenge, particularly for the often non-expert operators tasked with ensuring their optimal performance. In this paper, we address this challenge by designing a new interface to explain to domain experts the underlying behaviour of a bandit. Central is a metric we term "value gain", a measure derived from off-policy evaluation to quantify the real-world impact of sub-components within a bandit. We conduct a qualitative user study to evaluate the effectiveness of our interface. Our findings suggest that by carefully balancing technical rigour with accessible presentation, it is possible to empower non-experts to manage complex machine learning systems. We conclude by outlining guiding principles that other researchers should consider when building similar such interfaces in future.

Designing an Interpretable Interface for Contextual Bandits

TL;DR

This work tackles the interpretability gap of contextual bandits for domain-expert operators by introducing an interpretable production dashboard built around a value gain metric derived from off-policy evaluation. The value gain is defined as , with and estimated via off-policy methods such as inverse propensity scoring . The interface provides top-level, arm-level, and context-level visualizations (including a radar chart and context-contribution bars) to support ablation-style reasoning about component value. A qualitative user study with three marketing professionals demonstrates that, when paired with accessible explanations, technical metrics can be understood and used to guide production decisions, yielding practical design principles for future interpretable dashboards in bandit settings. The work argues for integrating rigorous, technically grounded measures with clear presentation to empower non-experts in managing complex ML systems in production.

Abstract

Contextual bandits have become an increasingly popular solution for personalized recommender systems. Despite their growing use, the interpretability of these systems remains a significant challenge, particularly for the often non-expert operators tasked with ensuring their optimal performance. In this paper, we address this challenge by designing a new interface to explain to domain experts the underlying behaviour of a bandit. Central is a metric we term "value gain", a measure derived from off-policy evaluation to quantify the real-world impact of sub-components within a bandit. We conduct a qualitative user study to evaluate the effectiveness of our interface. Our findings suggest that by carefully balancing technical rigour with accessible presentation, it is possible to empower non-experts to manage complex machine learning systems. We conclude by outlining guiding principles that other researchers should consider when building similar such interfaces in future.
Paper Structure (14 sections, 4 equations, 1 figure, 3 tables)

This paper contains 14 sections, 4 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: User interface for contextual bandits. The interface comprises three main components: top level performance, variance performance and performance per context. They each describe different elements of the performance of the bandit system, in increasing granularity of units.