Table of Contents
Fetching ...

Technical Note: Defining and Quantifying AND-OR Interactions for Faithful and Concise Explanation of DNNs

Mingjie Li, Quanshi Zhang

TL;DR

The paper addresses the challenge of explaining deep neural networks by quantifying input-variable interactions. It introduces two interaction types, AND and OR, and defines faithfulness and conciseness for interaction-based explanations, showing the Harsanyi dividend provides a unique faithful decomposition for AND interactions and an analogous formulation for OR interactions. A joint AND-OR framework is proposed, using a sparsity-promoting (Lasso) objective to produce a concise set of salient symbolic concepts that faithfully reconstruct the DNN output via equations like $v(T)=\sum_{S\subseteq T} \hat{\mathcal{I}}^{\text{AND}}(S)+\hat{\mathcal{I}}^{\text{OR}}(\emptyset)+\sum_{S\cap T\neq\emptyset}\hat{\mathcal{I}}^{\text{OR}}(S)$. The approach provides a principled path toward interpretable explanations that reflect the model's inference logic with a compact set of concepts, potentially improving transparency and trust in DNNs.

Abstract

In this technical note, we aim to explain a deep neural network (DNN) by quantifying the encoded interactions between input variables, which reflects the DNN's inference logic. Specifically, we first rethink the definition of interactions, and then formally define faithfulness and conciseness for interaction-based explanation. To this end, we propose two kinds of interactions, i.e., the AND interaction and the OR interaction. For faithfulness, we prove the uniqueness of the AND (OR) interaction in quantifying the effect of the AND (OR) relationship between input variables. Besides, based on AND-OR interactions, we design techniques to boost the conciseness of the explanation, while not hurting the faithfulness. In this way, the inference logic of a DNN can be faithfully and concisely explained by a set of symbolic concepts.

Technical Note: Defining and Quantifying AND-OR Interactions for Faithful and Concise Explanation of DNNs

TL;DR

The paper addresses the challenge of explaining deep neural networks by quantifying input-variable interactions. It introduces two interaction types, AND and OR, and defines faithfulness and conciseness for interaction-based explanations, showing the Harsanyi dividend provides a unique faithful decomposition for AND interactions and an analogous formulation for OR interactions. A joint AND-OR framework is proposed, using a sparsity-promoting (Lasso) objective to produce a concise set of salient symbolic concepts that faithfully reconstruct the DNN output via equations like . The approach provides a principled path toward interpretable explanations that reflect the model's inference logic with a compact set of concepts, potentially improving transparency and trust in DNNs.

Abstract

In this technical note, we aim to explain a deep neural network (DNN) by quantifying the encoded interactions between input variables, which reflects the DNN's inference logic. Specifically, we first rethink the definition of interactions, and then formally define faithfulness and conciseness for interaction-based explanation. To this end, we propose two kinds of interactions, i.e., the AND interaction and the OR interaction. For faithfulness, we prove the uniqueness of the AND (OR) interaction in quantifying the effect of the AND (OR) relationship between input variables. Besides, based on AND-OR interactions, we design techniques to boost the conciseness of the explanation, while not hurting the faithfulness. In this way, the inference logic of a DNN can be faithfully and concisely explained by a set of symbolic concepts.
Paper Structure (7 sections, 3 theorems, 10 equations)

This paper contains 7 sections, 3 theorems, 10 equations.

Key Result

Theorem 1

Let $\phi(i)$ denote the Shapley value shapley1953value of an input variable $i$. Then, its Shapley value can be represented as the weighted sum of interaction utilities, i.e.$\phi(i)=\sum_{S\subseteq N\backslash\{i\}}\frac{1}{|S|+1} \mathcal{I}^{\textit{AND}}(S\cup\{i\})$. In other words, the utili

Theorems & Definitions (3)

  • Theorem 1: Connection to the Shapley value, proved by harsanyi1963simplified
  • Theorem 2: Connection to the Shapley interaction index
  • Theorem 3: Connection to the Shapley Taylor interaction index