Technical Note: Defining and Quantifying AND-OR Interactions for Faithful and Concise Explanation of DNNs
Mingjie Li, Quanshi Zhang
TL;DR
The paper addresses the challenge of explaining deep neural networks by quantifying input-variable interactions. It introduces two interaction types, AND and OR, and defines faithfulness and conciseness for interaction-based explanations, showing the Harsanyi dividend provides a unique faithful decomposition for AND interactions and an analogous formulation for OR interactions. A joint AND-OR framework is proposed, using a sparsity-promoting (Lasso) objective to produce a concise set of salient symbolic concepts that faithfully reconstruct the DNN output via equations like $v(T)=\sum_{S\subseteq T} \hat{\mathcal{I}}^{\text{AND}}(S)+\hat{\mathcal{I}}^{\text{OR}}(\emptyset)+\sum_{S\cap T\neq\emptyset}\hat{\mathcal{I}}^{\text{OR}}(S)$. The approach provides a principled path toward interpretable explanations that reflect the model's inference logic with a compact set of concepts, potentially improving transparency and trust in DNNs.
Abstract
In this technical note, we aim to explain a deep neural network (DNN) by quantifying the encoded interactions between input variables, which reflects the DNN's inference logic. Specifically, we first rethink the definition of interactions, and then formally define faithfulness and conciseness for interaction-based explanation. To this end, we propose two kinds of interactions, i.e., the AND interaction and the OR interaction. For faithfulness, we prove the uniqueness of the AND (OR) interaction in quantifying the effect of the AND (OR) relationship between input variables. Besides, based on AND-OR interactions, we design techniques to boost the conciseness of the explanation, while not hurting the faithfulness. In this way, the inference logic of a DNN can be faithfully and concisely explained by a set of symbolic concepts.
