Using Slisemap to interpret physical data
Lauri Seppäläinen, Anton Björklund, Vitus Besel, Kai Puolamäki
TL;DR
Problem: interpreting high-dimensional physical data while understanding black-box predictions. Approach: apply Slisemap, which jointly learns a 2D embedding and local explanatory models for each data item, with neighborhood weights given by $e^{-D(z_i,z_j)}$ and radius $r$, using $d=2$ and $r=3.5$. Contributions: demonstrates the method on GeckoQ, Jets, and QM9, showing that similar explanations cluster in the embedding and that local models align with known physics/chemistry; introduces stability and explanation-quality metrics to validate the embeddings and explains how to compare to unsupervised baselines. Findings: Slisemap embeddings reveal target-related structure and produce explanations that outperform PCA, t-SNE, and UMAP in capturing the relationship between features and the target; the approach also surfaces multiple plausible local explanations, supporting uncertainty quantification. Significance: provides a practical workflow for physically grounded interpretation of black-box predictions and potential for domain-specific insights and uncertainty analysis.
Abstract
Manifold visualisation techniques are commonly used to visualise high-dimensional datasets in physical sciences. In this paper we apply a recently introduced manifold visualisation method, called Slise, on datasets from physics and chemistry. Slisemap combines manifold visualisation with explainable artificial intelligence. Explainable artificial intelligence is used to investigate the decision processes of black box machine learning models and complex simulators. With Slisemap we find an embedding such that data items with similar local explanations are grouped together. Hence, Slisemap gives us an overview of the different behaviours of a black box model. This makes Slisemap into a supervised manifold visualisation method, where the patterns in the embedding reflect a target property. In this paper we show how Slisemap can be used and evaluated on physical data and that Slisemap is helpful in finding meaningful information on classification and regression models trained on these datasets.
