Table of Contents
Fetching ...

Saliency Methods are Encoders: Analysing Logical Relations Towards Interpretation

Leonid Schwenke, Martin Atzmueller

TL;DR

The paper tackles explainability in deep learning by introducing ANDOR, a propositional-logic dataset framework that enables ground-truth-like reasoning tests for saliency methods. It formalizes information flow with $f: D \rightarrow C$ over inputs $D = M^l$ and defines minimal and maximal information coverages $R^d_{min}$ and $R^d_{max}$ to approximate all valid reasoning; it then develops metrics (e.g., $NIB$, Full-DCA, Minimal-DCA) and evaluates 12 saliency methods on CNN and Transformer models across 9 ANDOR datasets. The results show that global saliency aggregations often misrepresent local relevant information and that saliency rankings can encode classification information into the score order, including baseline leakage, raising concerns about trustworthiness. The work argues for using logic-based trust tests like ANDOR to guide the development and validation of more faithful explanation methods in AI.

Abstract

With their increase in performance, neural network architectures also become more complex, necessitating explainability. Therefore, many new and improved methods are currently emerging, which often generate so-called saliency maps in order to improve interpretability. Those methods are often evaluated by visual expectations, yet this typically leads towards a confirmation bias. Due to a lack of a general metric for explanation quality, non-accessible ground truth data about the model's reasoning and the large amount of involved assumptions, multiple works claim to find flaws in those methods. However, this often leads to unfair comparison metrics. Additionally, the complexity of most datasets (mostly images or text) is often so high, that approximating all possible explanations is not feasible. For those reasons, this paper introduces a test for saliency map evaluation: proposing controlled experiments based on all possible model reasonings over multiple simple logical datasets. Using the contained logical relationships, we aim to understand how different saliency methods treat information in different class discriminative scenarios (e.g. via complementary and redundant information). By introducing multiple new metrics, we analyse propositional logical patterns towards a non-informative attribution score baseline to find deviations of typical expectations. Our results show that saliency methods can encode classification relevant information into the ordering of saliency scores.

Saliency Methods are Encoders: Analysing Logical Relations Towards Interpretation

TL;DR

The paper tackles explainability in deep learning by introducing ANDOR, a propositional-logic dataset framework that enables ground-truth-like reasoning tests for saliency methods. It formalizes information flow with over inputs and defines minimal and maximal information coverages and to approximate all valid reasoning; it then develops metrics (e.g., , Full-DCA, Minimal-DCA) and evaluates 12 saliency methods on CNN and Transformer models across 9 ANDOR datasets. The results show that global saliency aggregations often misrepresent local relevant information and that saliency rankings can encode classification information into the score order, including baseline leakage, raising concerns about trustworthiness. The work argues for using logic-based trust tests like ANDOR to guide the development and validation of more faithful explanation methods in AI.

Abstract

With their increase in performance, neural network architectures also become more complex, necessitating explainability. Therefore, many new and improved methods are currently emerging, which often generate so-called saliency maps in order to improve interpretability. Those methods are often evaluated by visual expectations, yet this typically leads towards a confirmation bias. Due to a lack of a general metric for explanation quality, non-accessible ground truth data about the model's reasoning and the large amount of involved assumptions, multiple works claim to find flaws in those methods. However, this often leads to unfair comparison metrics. Additionally, the complexity of most datasets (mostly images or text) is often so high, that approximating all possible explanations is not feasible. For those reasons, this paper introduces a test for saliency map evaluation: proposing controlled experiments based on all possible model reasonings over multiple simple logical datasets. Using the contained logical relationships, we aim to understand how different saliency methods treat information in different class discriminative scenarios (e.g. via complementary and redundant information). By introducing multiple new metrics, we analyse propositional logical patterns towards a non-informative attribution score baseline to find deviations of typical expectations. Our results show that saliency methods can encode classification relevant information into the ordering of saliency scores.

Paper Structure

This paper contains 22 sections, 35 figures, 1 table.

Figures (35)

  • Figure 1: Depicting the three different ANDOR test-parameter-instances for our experiments. Resulting in datasets of sizes $2^8=256$ (2inBinary), $4^8=65.536$ (2inQuaternary) and $2^{12}=4.096$ (3inBinary), because we take all possible inputs.
  • Figure 2: Framework for the ANDOR dataset.
  • Figure 3: Average Random Forest import. with std. of the split test set, where the DL-Model reached 100% acc.
  • Figure 4: Average saliency scores with std. per logic gate per saliency method on the split test set between all trained DL-models, which reached a 100% accuracy, cf. Figure \ref{['fig:treeImportance']}.
  • Figure 5: Average NIB with std. per class/top-level on the split test set of all DL-models with a 100% accuracy.
  • ...and 30 more figures