shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python

Martin Jullum; Lars Henry Berge Olsen; Jon Lachmann; Annabelle Redelmeier

shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python

Martin Jullum, Lars Henry Berge Olsen, Jon Lachmann, Annabelle Redelmeier

TL;DR

The paper presents shapr, an R package (and shaprpy for Python) that provides conditional Shapley value explanations for a wide range of predictive models. It advances model interpretability by emphasizing conditional (distribution-aware) explanations, implements a comprehensive set of estimation approaches (e.g., independence, Gaussian, copula, ctree, VA EAC, regression-based), and offers iterative convergence checks, parallelization, and rich visualizations. It also covers extensions to asymmetric and causal Shapley values, supports forecasting models via explain_forecast(), and includes a Python wrapper to bring these capabilities to Python workflows. The work enables accurate, scalable, and flexible explanations in both tabular and time-series contexts, with practical guidance for method selection via the MSE_v criterion and extensive examples on real datasets.

Abstract

This paper introduces the shapr R package, a versatile tool for generating Shapley value based prediction explanations for machine learning and statistical regression models. Moreover, the shaprpy Python library brings the core capabilities of shapr to the Python ecosystem. Shapley values originate from cooperative game theory in the 1950s, but have over the past few years become a widely used method for quantifying how a model's features/covariates contribute to specific prediction outcomes. The shapr package emphasizes conditional Shapley value estimates, providing a comprehensive range of approaches for accurately capturing feature dependencies -- a crucial aspect for correct model explanation, typically lacking in similar software. In addition to regular tabular data, the shapr R package includes specialized functionality for explaining time series forecasts. The package offers a minimal set of user functions with sensible default values for most use cases while providing extensive flexibility for advanced users to fine-tune computations. Additional features include parallelized computations, iterative estimation with convergence detection, and rich visualization tools. shapr also extends its functionality to compute causal and asymmetric Shapley values when causal information is available. Overall, the shapr and shaprpy packages aim to enhance the interpretability of predictive models within a powerful and user-friendly framework.

shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python

TL;DR

Abstract

shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)