Readable Twins of Unreadable Models
Krzysztof Pancerz, Piotr Kulicki, Michał Kalisz, Andrzej Burda, Maciej Stanisławski, Jaromir Sarzyński
TL;DR
The paper addresses the challenge of explaining unreadable deep learning models by constructing readable twins, imprecise information flow models, via a Sequential Information System and rough set flow graphs. It details a stepwise pipeline (training, clustering activations, SIS construction, RSFG creation, path mining with an evolutionary algorithm, and visualization) and demonstrates the approach on MNIST digit classification. The key contribution is a concrete methodology that translates internal model behavior into human-readable, path-based explanations, with visual summaries. This work enables principled interpretability of deep models and opens avenues for ontology-based extensions and alternative graphical representations such as Petri nets.
Abstract
Creating responsible artificial intelligence (AI) systems is an important issue in contemporary research and development of works on AI. One of the characteristics of responsible AI systems is their explainability. In the paper, we are interested in explainable deep learning (XDL) systems. On the basis of the creation of digital twins of physical objects, we introduce the idea of creating readable twins (in the form of imprecise information flow models) for unreadable deep learning models. The complete procedure for switching from the deep learning model (DLM) to the imprecise information flow model (IIFM) is presented. The proposed approach is illustrated with an example of a deep learning classification model for image recognition of handwritten digits from the MNIST data set.
