Table of Contents
Fetching ...

VISTA: A Panoramic View of Neural Representations

Tom White

TL;DR

VISTA tackles the challenge of interpreting high-dimensional neural representations by constructing a semantic cartography that maps representations into a $2$-D space via clustering and text-to-image diffusion. It integrates representation encoding, dimensionality reduction with UMAP, and a tiled MultiDiffusion panorama to render interactive visual maps whose fidelity is quantified by a mutual-knn gain metric. In a case study on Gemma2-2B sparse autoencoder latents, VISTA both corroborates some automated LLM-based interpretations and reveals additional associations not captured by those methods. The approach suggests that visual, interactive exploration can complement automated interpretability pipelines and may facilitate cross-model understanding of latent concepts across domains.

Abstract

We present VISTA (Visualization of Internal States and Their Associations), a novel pipeline for visually exploring and interpreting neural network representations. VISTA addresses the challenge of analyzing vast multidimensional spaces in modern machine learning models by mapping representations into a semantic 2D space. The resulting collages visually reveal patterns and relationships within internal representations. We demonstrate VISTA's utility by applying it to sparse autoencoder latents uncovering new properties and interpretations. We review the VISTA methodology, present findings from our case study ( https://got.drib.net/latents/ ), and discuss implications for neural network interpretability across various domains of machine learning.

VISTA: A Panoramic View of Neural Representations

TL;DR

VISTA tackles the challenge of interpreting high-dimensional neural representations by constructing a semantic cartography that maps representations into a -D space via clustering and text-to-image diffusion. It integrates representation encoding, dimensionality reduction with UMAP, and a tiled MultiDiffusion panorama to render interactive visual maps whose fidelity is quantified by a mutual-knn gain metric. In a case study on Gemma2-2B sparse autoencoder latents, VISTA both corroborates some automated LLM-based interpretations and reveals additional associations not captured by those methods. The approach suggests that visual, interactive exploration can complement automated interpretability pipelines and may facilitate cross-model understanding of latent concepts across domains.

Abstract

We present VISTA (Visualization of Internal States and Their Associations), a novel pipeline for visually exploring and interpreting neural network representations. VISTA addresses the challenge of analyzing vast multidimensional spaces in modern machine learning models by mapping representations into a semantic 2D space. The resulting collages visually reveal patterns and relationships within internal representations. We demonstrate VISTA's utility by applying it to sparse autoencoder latents uncovering new properties and interpretations. We review the VISTA methodology, present findings from our case study ( https://got.drib.net/latents/ ), and discuss implications for neural network interpretability across various domains of machine learning.

Paper Structure

This paper contains 11 sections, 7 figures.

Figures (7)

  • Figure 1: Activations of Gemma2-2B residual latent 20-9745 as a VISTA map aggregating thousands of inputs. The automated explanation given to this latent by Gemma Scope is "references to muscle-related subjects and terminology". Zooming into the upper cluster (far left) reveals a collection of muscle related themes. However examining the lower cluster (far right) reveals a collection of subclusters sharing only word morphology such as "mustache", "mushroom", and "mystical". https://got.drib.net/latents/muscle/
  • Figure 2: VISTA map for latent 20-5011 (left) with detail view (right). https://got.drib.net/latents/ingredients/.
  • Figure 3: VISTA map for latent 20-9220 (left) shows high level patterns such as the stripes in the detail view (center). Zooming in further (right) we discover distinct clusters of animal combinations ("resembling a bear or big"), sky combinations ("looking at a sunset or sunrise") and style combinations ("a medieval or fantasy setting"). https://got.drib.net/latents/indebted/
  • Figure 4: VISTA map mutual-knn gain in case study one with a maximum at k=9% (360)
  • Figure 5: VISTA map for "ingredients" = latent gemma-2-2b/20-res-16k/5011. https://got.drib.net/latents/ingredients/
  • ...and 2 more figures