Table of Contents
Fetching ...

Visualizing LLM Latent Space Geometry Through Dimensionality Reduction

Alex Ning, Vainateya Rangaraju, Yen-Ling Kuo

TL;DR

The paper tackles the challenge of interpreting LLM internals by visualizing latent-space geometry of decoder-only Transformers (GPT-2 and LLaMa) using PCA and UMAP. It introduces a reproducible pipeline that captures six internal points per Transformer block, processes the high-dimensional states, and projects them into interpretable 2D representations, revealing geometry related to layer, position, and component type. Key findings include a persistent separation between attention and MLP outputs in latent space, large norms at the initial sequence position, and distinct sequence-position effects tied to GPT-2’s learned embeddings and LLaMa’s RoPE encodings. The work advances mechanistic interpretability by highlighting concrete geometric signatures of feature representations and offering a codebase for broader analysis and future studies across models and datasets.

Abstract

Large language models (LLMs) achieve state-of-the-art results across many natural language tasks, but their internal mechanisms remain difficult to interpret. In this work, we extract, process, and visualize latent state geometries in Transformer-based language models through dimensionality reduction. We capture layerwise activations at multiple points within Transformer blocks and enable systematic analysis through Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). We demonstrate experiments on GPT-2 and LLaMa models, where we uncover interesting geometric patterns in latent space. Notably, we identify a clear separation between attention and MLP component outputs across intermediate layers, a pattern not documented in prior work to our knowledge. We also characterize the high norm of latent states at the initial sequence position and visualize the layerwise evolution of latent states. Additionally, we demonstrate the high-dimensional helical structure of GPT-2's positional embeddings and the sequence-wise geometric patterns in LLaMa. We make our code available at https://github.com/Vainateya/Feature_Geometry_Visualization.

Visualizing LLM Latent Space Geometry Through Dimensionality Reduction

TL;DR

The paper tackles the challenge of interpreting LLM internals by visualizing latent-space geometry of decoder-only Transformers (GPT-2 and LLaMa) using PCA and UMAP. It introduces a reproducible pipeline that captures six internal points per Transformer block, processes the high-dimensional states, and projects them into interpretable 2D representations, revealing geometry related to layer, position, and component type. Key findings include a persistent separation between attention and MLP outputs in latent space, large norms at the initial sequence position, and distinct sequence-position effects tied to GPT-2’s learned embeddings and LLaMa’s RoPE encodings. The work advances mechanistic interpretability by highlighting concrete geometric signatures of feature representations and offering a codebase for broader analysis and future studies across models and datasets.

Abstract

Large language models (LLMs) achieve state-of-the-art results across many natural language tasks, but their internal mechanisms remain difficult to interpret. In this work, we extract, process, and visualize latent state geometries in Transformer-based language models through dimensionality reduction. We capture layerwise activations at multiple points within Transformer blocks and enable systematic analysis through Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). We demonstrate experiments on GPT-2 and LLaMa models, where we uncover interesting geometric patterns in latent space. Notably, we identify a clear separation between attention and MLP component outputs across intermediate layers, a pattern not documented in prior work to our knowledge. We also characterize the high norm of latent states at the initial sequence position and visualize the layerwise evolution of latent states. Additionally, we demonstrate the high-dimensional helical structure of GPT-2's positional embeddings and the sequence-wise geometric patterns in LLaMa. We make our code available at https://github.com/Vainateya/Feature_Geometry_Visualization.

Paper Structure

This paper contains 20 sections, 14 figures.

Figures (14)

  • Figure 1: The two equivalent perspectives on the Transformer architecture
  • Figure 2: Overview of visualization pipeline: Text samples first pass through Transformer layers for latent extraction, these latent states are then organized into a structured dataset alongside metadata, then reduced via dimensionality reduction for interpretable visualizations.
  • Figure 3: The six capture points within each Transformer block. Points 1 and 4 correspond to the outputs of the normalization layers (pre-attention and pre-MLP). Points 2 and 5 correspond to the outputs of the attention and MLP modules, respectively. Points 3 and 6 capture the residual stream after the attention and MLP additions.
  • Figure 4: Norm of latent states from intermediate layers of both GPT-2 and LLaMa along sequence positions. Norms were averaged over both samples and layers. The dataset used is PG-19.
  • Figure 5: Histograms of the mean intermediate layer latent state norm from each vocab token for both GPT-2 and LLaMa. Each vocab token was input individually into each model as the initial token.
  • ...and 9 more figures