Table of Contents
Fetching ...

A redescription mining framework for post-hoc explaining and relating deep learning models

Matej Mihelčić, Ivan Grubišić, Miha Keber

TL;DR

The paper addresses the challenge of explaining deep learning models by introducing ExItNeRdoM, a model-agnostic redescription mining framework that post-hoc relates neuron activations to domain attributes and target labels, both within and across models. It combines uniform binning of activations, PCT-based rule generation, and multi-view redescription mining to produce interpretable redescriptions and rules, enabling pedagogical and decompositional explanations. The approach is evaluated on randomization tests and across CNNs, ResNets, MLPs, and BERT-like models, showing strong coverage, meaningful neuron-role associations, and competitive fidelity against state-of-the-art rule-extraction methods. The framework scales via parallel computation and constraint-based strategies, offering a powerful tool for researchers and practitioners to interpret complex DLMs and relate their components to domain knowledge. Overall, ExItNeRdoM provides a versatile, extensible pathway to understand how neural activations correspond to observable attributes and predictions, with clear benefits for transparency and scientific insight.

Abstract

Deep learning models (DLMs) achieve increasingly high performance both on structured and unstructured data. They significantly extended applicability of machine learning to various domains. Their success in making predictions, detecting patterns and generating new data made significant impact on science and industry. Despite these accomplishments, DLMs are difficult to explain because of their enormous size. In this work, we propose a novel framework for post-hoc explaining and relating DLMs using redescriptions. The framework allows cohort analysis of arbitrary DLMs by identifying statistically significant redescriptions of neuron activations. It allows coupling neurons to a set of target labels or sets of descriptive attributes, relating layers within a single DLM or associating different DLMs. The proposed framework is independent of the artificial neural network architecture and can work with more complex target labels (e.g. multi-label or multi-target scenario). Additionally, it can emulate both pedagogical and decompositional approach to rule extraction. The aforementioned properties of the proposed framework can increase explainability and interpretability of arbitrary DLMs by providing different information compared to existing explainable-AI approaches.

A redescription mining framework for post-hoc explaining and relating deep learning models

TL;DR

The paper addresses the challenge of explaining deep learning models by introducing ExItNeRdoM, a model-agnostic redescription mining framework that post-hoc relates neuron activations to domain attributes and target labels, both within and across models. It combines uniform binning of activations, PCT-based rule generation, and multi-view redescription mining to produce interpretable redescriptions and rules, enabling pedagogical and decompositional explanations. The approach is evaluated on randomization tests and across CNNs, ResNets, MLPs, and BERT-like models, showing strong coverage, meaningful neuron-role associations, and competitive fidelity against state-of-the-art rule-extraction methods. The framework scales via parallel computation and constraint-based strategies, offering a powerful tool for researchers and practitioners to interpret complex DLMs and relate their components to domain knowledge. Overall, ExItNeRdoM provides a versatile, extensible pathway to understand how neural activations correspond to observable attributes and predictions, with clear benefits for transparency and scientific insight.

Abstract

Deep learning models (DLMs) achieve increasingly high performance both on structured and unstructured data. They significantly extended applicability of machine learning to various domains. Their success in making predictions, detecting patterns and generating new data made significant impact on science and industry. Despite these accomplishments, DLMs are difficult to explain because of their enormous size. In this work, we propose a novel framework for post-hoc explaining and relating DLMs using redescriptions. The framework allows cohort analysis of arbitrary DLMs by identifying statistically significant redescriptions of neuron activations. It allows coupling neurons to a set of target labels or sets of descriptive attributes, relating layers within a single DLM or associating different DLMs. The proposed framework is independent of the artificial neural network architecture and can work with more complex target labels (e.g. multi-label or multi-target scenario). Additionally, it can emulate both pedagogical and decompositional approach to rule extraction. The aforementioned properties of the proposed framework can increase explainability and interpretability of arbitrary DLMs by providing different information compared to existing explainable-AI approaches.
Paper Structure (26 sections, 2 equations, 7 figures, 7 tables, 3 algorithms)

This paper contains 26 sections, 2 equations, 7 figures, 7 tables, 3 algorithms.

Figures (7)

  • Figure 1: The overall computation flow of analysing DLMs. a) Main data input to the RM algorithm are views $W$ with cardinality of the entities $|E|=N$ as rows $\{X_1, … , X_{N}\}$ and columns as attributes $\{A_1, \dots, A_{M}\} \subset \mathcal{A}$. Attributes can be domain data features or hidden representation values, i.e. neuron activations after forward propagation of an entity through a trained DLM. Part b) Arbitrary views can be chosen as input to the ExItNeRdoM, even between different DLMs that in any way represent the same set of entities. $L_kn_i$ attribute represents $i$-th neuron activation in the $k$-th layer, i.e. $n_{i,k}$. c) are the outputs of the framework. Rule ExItNeRdoM-1 describes that MLP neuron $n_{in,9}$ activation is in $[0,0.485]$if and only if the Cognitive Dementia Rating Sum of boxes, CDRS_bl is in $[1,10]$. To understand the strength of the discovered equivalence, we need information about $supp(\texttt{ExItNeRdoM}\text{-}1)$, and the corresponding Jaccard measure, see Section \ref{['sec:exp']}.
  • Figure 2: A diagram of the ExItNeRdoM methodology. Methodology can accept multiple views describing the same set of entities. Blue boxes are the main steps of the methodology, and the green boxes represent corresponding intermediate and final output results of the methodology. Lines on the left correspond to lines in Algorithm \ref{['alg:ExpInt']}.
  • Figure 3: Create $2$-view redescriptions. Bins of the input attribute $A$ from view $W_c$ are used to create classes of a multi-class classification problem ($class_i$) and rules ($rule_i$). Classes are used to create a PCT and a supplementing random forest model on view $W_p$, $p\neq c$. The class value of entity $e_k$ equals $c_k$ if the selected attribute value of $e_k$ falls in the $k$-th bin. The obtained models are transformed to rules. Rules obtained from bins of $A$ and rules obtained from the models are used to create $2$-view redescriptions. Redescriptions with insufficient accuracy ($J<J_{min}$) and statistically insignificant redescriptions ($p>P_{max}$) are filtered out.
  • Figure 4: Completing incomplete redescriptions. Input incomplete redescriptions are used as targets to create a PCT on view $W_s$, $s\neq c,\ p$. The $i$-th target for entity $e_k$ is $1$ if and only if $e_k\in supp(R_i)$. The obtained PCT is transformed to rules. Input rules are obtained from the PCT are used to complete the incomplete input redescriptions.
  • Figure 5: Execution times in minutes of the ExitNeRdoM methodology required to relate penultimate layers of two multilayer perceptron networks. Each network contains $30$ neurons in the penultimate layer, thus $60$ neurons are described in total. Experiments are repeated $10$ times to measure potential variability due to use of a supplementing random forest.
  • ...and 2 more figures