Using Slisemap to interpret physical data

Lauri Seppäläinen; Anton Björklund; Vitus Besel; Kai Puolamäki

Using Slisemap to interpret physical data

Lauri Seppäläinen, Anton Björklund, Vitus Besel, Kai Puolamäki

TL;DR

Problem: interpreting high-dimensional physical data while understanding black-box predictions. Approach: apply Slisemap, which jointly learns a 2D embedding and local explanatory models for each data item, with neighborhood weights given by $e^{-D(z_i,z_j)}$ and radius $r$, using $d=2$ and $r=3.5$. Contributions: demonstrates the method on GeckoQ, Jets, and QM9, showing that similar explanations cluster in the embedding and that local models align with known physics/chemistry; introduces stability and explanation-quality metrics to validate the embeddings and explains how to compare to unsupervised baselines. Findings: Slisemap embeddings reveal target-related structure and produce explanations that outperform PCA, t-SNE, and UMAP in capturing the relationship between features and the target; the approach also surfaces multiple plausible local explanations, supporting uncertainty quantification. Significance: provides a practical workflow for physically grounded interpretation of black-box predictions and potential for domain-specific insights and uncertainty analysis.

Abstract

Manifold visualisation techniques are commonly used to visualise high-dimensional datasets in physical sciences. In this paper we apply a recently introduced manifold visualisation method, called Slise, on datasets from physics and chemistry. Slisemap combines manifold visualisation with explainable artificial intelligence. Explainable artificial intelligence is used to investigate the decision processes of black box machine learning models and complex simulators. With Slisemap we find an embedding such that data items with similar local explanations are grouped together. Hence, Slisemap gives us an overview of the different behaviours of a black box model. This makes Slisemap into a supervised manifold visualisation method, where the patterns in the embedding reflect a target property. In this paper we show how Slisemap can be used and evaluated on physical data and that Slisemap is helpful in finding meaningful information on classification and regression models trained on these datasets.

Using Slisemap to interpret physical data

TL;DR

and radius

, using

and

. Contributions: demonstrates the method on GeckoQ, Jets, and QM9, showing that similar explanations cluster in the embedding and that local models align with known physics/chemistry; introduces stability and explanation-quality metrics to validate the embeddings and explains how to compare to unsupervised baselines. Findings: Slisemap embeddings reveal target-related structure and produce explanations that outperform PCA, t-SNE, and UMAP in capturing the relationship between features and the target; the approach also surfaces multiple plausible local explanations, supporting uncertainty quantification. Significance: provides a practical workflow for physically grounded interpretation of black-box predictions and potential for domain-specific insights and uncertainty analysis.

Abstract

Paper Structure (14 sections, 9 equations, 6 figures, 1 table)

This paper contains 14 sections, 9 equations, 6 figures, 1 table.

Introduction
Related work
Methods
SLISEMAP
Workflow and Performance Measures
Use Cases
Atmospheric relevant organic molecules: GeckoQ
Elementary Particle Jets
Small Organic Molecules: QM9
Evaluation of the solutions
Discussion and Conclusions
Chosen explainable features
Incidence of functional groups by cluster in GeckoQ
Explanation metrics

Figures (6)

Figure 1: The slisemap embedding of the GeckoQ data in the left panel. The number of clusters (seven) was chosen via visual inspection. The right panel includes the average local coefficients of each cluster.
Figure 2: (a) Fraction of molecules that contain at least one FG of hydroxyl, hydroperoxide, carboxylic acid, carbonylperoxyacid grouped by clusters 1, 6 and all the clusters. (b) slisemap and (c) t-SNE embedding, where the data points are binned, and the colour map corresponds to the median normalised target of the bins. Clusters 1 and 6 are encircled.
Figure 3: slisemap solution for $10~000$ jets from the particle jets dataset. The data items have been clustered according to the local models. The coefficients for all local models follow physical theory but with varying magnitudes.
Figure 4: slisemap embedding for the QM9 data set (10,000 molecules), clustered with 4 clusters. The right panel shows the ten most influential features for predicting the target (HOMO energy).
Figure 5: Permutation loss, local model stability and neighbourhood stability (\ref{['sec:workflow']}) for the use cases as a function of (resampled) dataset size. Smaller values are better. Dashed lines indicate reference values for baseline datasets where the target variables have been permuted randomly.
...and 1 more figures

Using Slisemap to interpret physical data

TL;DR

Abstract

Using Slisemap to interpret physical data

Authors

TL;DR

Abstract

Table of Contents

Figures (6)