LAVA: Explainability for Unsupervised Latent Embeddings
Ivan Stresec, Joana P. Gonçalves
TL;DR
LAVA tackles explainability for unsupervised latent embeddings by linking local embedding organization to input feature covariation. It defines probe-centered localities in latent space, represents localities with pairwise feature correlations, and extracts recurring correlation modules via Association Matrix Factorization, yielding stable, granular explanations at user-controlled granularity. The approach is demonstrated on MNIST and KPMP single-cell gene expression embeddings, uncovering visually meaningful pixel subpatterns and disease-associated gene correlations, and is shown to be robust against hyperparameter choices and competing explainability frameworks. Overall, LAVA provides a model-agnostic, post-hoc tool that expands interpretability and knowledge discovery in unsupervised manifold learning.
Abstract
Unsupervised black-box models are drivers of scientific discovery, yet are difficult to interpret, as their output is often a multidimensional embedding rather than a well-defined target. While explainability for supervised learning uncovers how input features contribute to predictions, its unsupervised counterpart should relate input features to the structure of the learned embeddings. However, adaptations of supervised model explainability for unsupervised learning provide either single-sample or dataset-summary explanations, remaining too fine-grained or reductive to be meaningful, and cannot explain embeddings without mapping functions. To bridge this gap, we propose LAVA, a post-hoc model-agnostic method to explain local embedding organization through feature covariation in the original input data. LAVA explanations comprise modules, capturing local subpatterns of input feature correlation that reoccur globally across the embeddings. LAVA delivers stable explanations at a desired level of granularity, revealing domain-relevant patterns such as visual parts of images or disease signals in cellular processes, otherwise missed by existing methods.
