Visualizing LLM Latent Space Geometry Through Dimensionality Reduction
Alex Ning, Vainateya Rangaraju, Yen-Ling Kuo
TL;DR
The paper tackles the challenge of interpreting LLM internals by visualizing latent-space geometry of decoder-only Transformers (GPT-2 and LLaMa) using PCA and UMAP. It introduces a reproducible pipeline that captures six internal points per Transformer block, processes the high-dimensional states, and projects them into interpretable 2D representations, revealing geometry related to layer, position, and component type. Key findings include a persistent separation between attention and MLP outputs in latent space, large norms at the initial sequence position, and distinct sequence-position effects tied to GPT-2’s learned embeddings and LLaMa’s RoPE encodings. The work advances mechanistic interpretability by highlighting concrete geometric signatures of feature representations and offering a codebase for broader analysis and future studies across models and datasets.
Abstract
Large language models (LLMs) achieve state-of-the-art results across many natural language tasks, but their internal mechanisms remain difficult to interpret. In this work, we extract, process, and visualize latent state geometries in Transformer-based language models through dimensionality reduction. We capture layerwise activations at multiple points within Transformer blocks and enable systematic analysis through Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). We demonstrate experiments on GPT-2 and LLaMa models, where we uncover interesting geometric patterns in latent space. Notably, we identify a clear separation between attention and MLP component outputs across intermediate layers, a pattern not documented in prior work to our knowledge. We also characterize the high norm of latent states at the initial sequence position and visualize the layerwise evolution of latent states. Additionally, we demonstrate the high-dimensional helical structure of GPT-2's positional embeddings and the sequence-wise geometric patterns in LLaMa. We make our code available at https://github.com/Vainateya/Feature_Geometry_Visualization.
