Explaining Neural Networks without Access to Training Data
Sascha Marton, Stefan Lüdtke, Christian Bartelt, Andrej Tschalzev, Heiner Stuckenschmidt
TL;DR
This paper tackles the problem of explaining neural networks without access to training data by extending Interpretation Networks (I-Nets) to learn surrogate decision-tree models directly from network parameters. A robust data-generation scheme using multiple distributions trains I-Nets to map from network internals to standard and soft DT representations, yielding high fidelity explanations even when training data are unavailable. Empirical results across real-world datasets and a credit card default case study show that I-Nets outperform traditional sample-based distillation, often by large margins, especially for low-complexity surrogates. The work demonstrates the practical viability of reliable, data-free global explanations, with implications for privacy-sensitive and safety-constrained settings.
Abstract
We consider generating explanations for neural networks in cases where the network's training data is not accessible, for instance due to privacy or safety issues. Recently, $\mathcal{I}$-Nets have been proposed as a sample-free approach to post-hoc, global model interpretability that does not require access to training data. They formulate interpretation as a machine learning task that maps network representations (parameters) to a representation of an interpretable function. In this paper, we extend the $\mathcal{I}$-Net framework to the cases of standard and soft decision trees as surrogate models. We propose a suitable decision tree representation and design of the corresponding $\mathcal{I}$-Net output layers. Furthermore, we make $\mathcal{I}$-Nets applicable to real-world tasks by considering more realistic distributions when generating the $\mathcal{I}$-Net's training data. We empirically evaluate our approach against traditional global, post-hoc interpretability approaches and show that it achieves superior results when the training data is not accessible.
