Table of Contents
Fetching ...

Stability theory of game-theoretic group feature explanations for machine learning models

Alexey Miroshnikov, Konstandinos Kotsiopoulos, Khashayar Filom, Arjun Ravi Kannan

TL;DR

This article studies feature attributions of Machine Learning models originating from linear game values and coalitional values defined as operators on appropriate functional spaces and shows analytically that grouping features this way has a stabilizing effect on the marginal operator on both group and individual levels, and allows for the unification of marginal and conditional explanations.

Abstract

In this article, we study feature attributions of Machine Learning (ML) models originating from linear game values and coalitional values defined as operators on appropriate functional spaces. The main focus is on random games based on the conditional and marginal expectations. The first part of our work formulates a stability theory for these explanation operators by establishing certain bounds for both marginal and conditional explanations. The differences between the two games are then elucidated, such as showing that the marginal explanations can become discontinuous on some naturally-designed domains, while the conditional explanations remain stable. In the second part of our work, group explanation methodologies are devised based on game values with coalition structure, where the features are grouped based on dependencies. We show analytically that grouping features this way has a stabilizing effect on the marginal operator on both group and individual levels, and allows for the unification of marginal and conditional explanations. Our results are verified in a number of numerical experiments where an information-theoretic measure of dependence is used for grouping.

Stability theory of game-theoretic group feature explanations for machine learning models

TL;DR

This article studies feature attributions of Machine Learning models originating from linear game values and coalitional values defined as operators on appropriate functional spaces and shows analytically that grouping features this way has a stabilizing effect on the marginal operator on both group and individual levels, and allows for the unification of marginal and conditional explanations.

Abstract

In this article, we study feature attributions of Machine Learning (ML) models originating from linear game values and coalitional values defined as operators on appropriate functional spaces. The main focus is on random games based on the conditional and marginal expectations. The first part of our work formulates a stability theory for these explanation operators by establishing certain bounds for both marginal and conditional explanations. The differences between the two games are then elucidated, such as showing that the marginal explanations can become discontinuous on some naturally-designed domains, while the conditional explanations remain stable. In the second part of our work, group explanation methodologies are devised based on game values with coalition structure, where the features are grouped based on dependencies. We show analytically that grouping features this way has a stabilizing effect on the marginal operator on both group and individual levels, and allows for the unification of marginal and conditional explanations. Our results are verified in a number of numerical experiments where an information-theoretic measure of dependence is used for grouping.

Paper Structure

This paper contains 58 sections, 49 theorems, 226 equations, 17 figures, 4 tables.

Key Result

Theorem 3.1

Let h, X and $\mathcal{E}^{ \text{\tiny \it CE}}$ be as in Definition def::condoperator. Then:

Figures (17)

  • Figure 1: Predictions and marginal Shapley values for XGBoost ($f_1$) and GBM ($f_2$) models.
  • Figure 2: Global individual and quotient explanations.
  • Figure 3: Gain stability.
  • Figure 4: Individual and quotient marginal explanations.
  • Figure 5: Individual and quotient explanation norms.
  • ...and 12 more figures

Theorems & Definitions (162)

  • Remark 2.1
  • Definition 2.1
  • Definition 2.2: consistency
  • Remark 2.2
  • Definition 3.1
  • Theorem 3.1: properties
  • proof
  • Remark 3.1
  • Corollary 3.1
  • proof
  • ...and 152 more