Table of Contents
Fetching ...

LAVA: Explainability for Unsupervised Latent Embeddings

Ivan Stresec, Joana P. Gonçalves

TL;DR

LAVA tackles explainability for unsupervised latent embeddings by linking local embedding organization to input feature covariation. It defines probe-centered localities in latent space, represents localities with pairwise feature correlations, and extracts recurring correlation modules via Association Matrix Factorization, yielding stable, granular explanations at user-controlled granularity. The approach is demonstrated on MNIST and KPMP single-cell gene expression embeddings, uncovering visually meaningful pixel subpatterns and disease-associated gene correlations, and is shown to be robust against hyperparameter choices and competing explainability frameworks. Overall, LAVA provides a model-agnostic, post-hoc tool that expands interpretability and knowledge discovery in unsupervised manifold learning.

Abstract

Unsupervised black-box models are drivers of scientific discovery, yet are difficult to interpret, as their output is often a multidimensional embedding rather than a well-defined target. While explainability for supervised learning uncovers how input features contribute to predictions, its unsupervised counterpart should relate input features to the structure of the learned embeddings. However, adaptations of supervised model explainability for unsupervised learning provide either single-sample or dataset-summary explanations, remaining too fine-grained or reductive to be meaningful, and cannot explain embeddings without mapping functions. To bridge this gap, we propose LAVA, a post-hoc model-agnostic method to explain local embedding organization through feature covariation in the original input data. LAVA explanations comprise modules, capturing local subpatterns of input feature correlation that reoccur globally across the embeddings. LAVA delivers stable explanations at a desired level of granularity, revealing domain-relevant patterns such as visual parts of images or disease signals in cellular processes, otherwise missed by existing methods.

LAVA: Explainability for Unsupervised Latent Embeddings

TL;DR

LAVA tackles explainability for unsupervised latent embeddings by linking local embedding organization to input feature covariation. It defines probe-centered localities in latent space, represents localities with pairwise feature correlations, and extracts recurring correlation modules via Association Matrix Factorization, yielding stable, granular explanations at user-controlled granularity. The approach is demonstrated on MNIST and KPMP single-cell gene expression embeddings, uncovering visually meaningful pixel subpatterns and disease-associated gene correlations, and is shown to be robust against hyperparameter choices and competing explainability frameworks. Overall, LAVA provides a model-agnostic, post-hoc tool that expands interpretability and knowledge discovery in unsupervised manifold learning.

Abstract

Unsupervised black-box models are drivers of scientific discovery, yet are difficult to interpret, as their output is often a multidimensional embedding rather than a well-defined target. While explainability for supervised learning uncovers how input features contribute to predictions, its unsupervised counterpart should relate input features to the structure of the learned embeddings. However, adaptations of supervised model explainability for unsupervised learning provide either single-sample or dataset-summary explanations, remaining too fine-grained or reductive to be meaningful, and cannot explain embeddings without mapping functions. To bridge this gap, we propose LAVA, a post-hoc model-agnostic method to explain local embedding organization through feature covariation in the original input data. LAVA explanations comprise modules, capturing local subpatterns of input feature correlation that reoccur globally across the embeddings. LAVA delivers stable explanations at a desired level of granularity, revealing domain-relevant patterns such as visual parts of images or disease signals in cellular processes, otherwise missed by existing methods.

Paper Structure

This paper contains 41 sections, 11 equations, 39 figures, 2 tables, 1 algorithm.

Figures (39)

  • Figure 1: LAVA method overview.a, LAVA explains embeddings of a dataset (example UMAP embeddings of MNIST). b, Based on neighborhood size $n$, LAVA represents embeddings as a set of localities: overlapping neighborhoods centered around probes. c, For each locality, LAVA calculates feature-to-feature correlations using the original samples of that locality (top right, locality correlations; bottom right, heatmap showing the sum of correlations per pixel, with added lines connecting correlated pixels). d, LAVA extracts modules, shared subpatterns of correlation extracted from localities that serve as explanations, describing feature covariation across the embeddings.
  • Figure 2: LAVA association matrix factorization (AMF).a, Locality $i$ is reconstructed by multiplying each module by its presence for $i$, then taking the maximum per entry across the $M$ extracted modules. b, Localities are represented as matrix $\boldsymbol{C}$: each row is a locality, each column a pair of features, and each entry the pairwise feature correlation in that locality. Module extraction approximates $\boldsymbol{C}$ with a maximum per entry based on modules $\boldsymbol{M}$ and their presences $\boldsymbol{P}$: it calculates an outer product between each column of $\boldsymbol{P}$ (superscript) and each row of $\boldsymbol{M}$, yielding a tensor of $M$ matrix slices of the same shape as $\boldsymbol{C}$ (Equation \ref{['eq:module_outer_product']}). Each slice represents module-contributed correlations across the localities. The $M$ dimension reduced by taking the per element maximum yields the reconstructed locality dataset.
  • Figure 3: LAVA stability experiments. Top results for MNIST, bottom for KPMP UMAP embeddings. a, Cosine similarity of average correlations of locality placements when differing the overlap hyperparameter. b, Statistics of module extraction runs. c-d, Cosine similarity between presence-weighted averaged modules c, across different locality placements and d, within one locality placement.
  • Figure 4: LAVA explanations. Labeled UMAP embeddings and LAVA explanations (modules & presences) for a, MNIST and b, KPMP.
  • Figure 5: LAVA module co-presences and locality reconstruction. Co-presence as entropy of module presences per locality for UMAP embeddings of a, MNIST and b, KPMP. c, Reconstruction of an example MNIST locality using its co-present modules.
  • ...and 34 more figures