VISTA: A Panoramic View of Neural Representations
Tom White
TL;DR
VISTA tackles the challenge of interpreting high-dimensional neural representations by constructing a semantic cartography that maps representations into a $2$-D space via clustering and text-to-image diffusion. It integrates representation encoding, dimensionality reduction with UMAP, and a tiled MultiDiffusion panorama to render interactive visual maps whose fidelity is quantified by a mutual-knn gain metric. In a case study on Gemma2-2B sparse autoencoder latents, VISTA both corroborates some automated LLM-based interpretations and reveals additional associations not captured by those methods. The approach suggests that visual, interactive exploration can complement automated interpretability pipelines and may facilitate cross-model understanding of latent concepts across domains.
Abstract
We present VISTA (Visualization of Internal States and Their Associations), a novel pipeline for visually exploring and interpreting neural network representations. VISTA addresses the challenge of analyzing vast multidimensional spaces in modern machine learning models by mapping representations into a semantic 2D space. The resulting collages visually reveal patterns and relationships within internal representations. We demonstrate VISTA's utility by applying it to sparse autoencoder latents uncovering new properties and interpretations. We review the VISTA methodology, present findings from our case study ( https://got.drib.net/latents/ ), and discuss implications for neural network interpretability across various domains of machine learning.
