Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data

Daniel D. Johnson

Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data

Daniel D. Johnson

TL;DR

This paper introduces Penzai, a data-centric toolkit for interpreting and editing neural networks, and Treescope, an interactive visualization tool. By representing the forward pass as a composable data structure built from declarative combinators, and by using a lightweight named axes system and JAX pytrees, the authors enable WYSIWYG interventions without hooks. The work provides a library of primitives and a flexible selector system, along with automatic visualization, a Transformer implementation, and utilities for common interpretability workflows. Together, these components support repeatable, multi-device interpretability research and rapid experimentation with model interventions.

Abstract

Much of today's machine learning research involves interpreting, modifying or visualizing models after they are trained. I present Penzai, a neural network library designed to simplify model manipulation by representing models as simple data structures, and Treescope, an interactive pretty-printer and array visualizer that can visualize both model inputs/outputs and the models themselves. Penzai models are built using declarative combinators that expose the model forward pass in the structure of the model object itself, and use named axes to ensure each operation is semantically meaningful. With Penzai's tree-editing selector system, users can both insert and replace model components, allowing them to intervene on intermediate values or make other edits to the model structure. Users can then get immediate feedback by visualizing the modified model with Treescope. I describe the motivation and main features of Penzai and Treescope, and discuss how treating the model as data enables a variety of analyses and interventions to be implemented as data-structure transformations, without requiring model designers to add explicit hooks.

Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data

TL;DR

Abstract

Paper Structure (14 sections, 8 figures)

This paper contains 14 sections, 8 figures.

Introduction
Previous Model-Manipulation Strategies
Penzai: Treating the Forward Pass as Data
Combinators and Primitive Layers
Lightweight Named Axes System
Models Are JAX Pytrees, Plus Mutable State
Selectors Enable Flexible Tree Modifications
Treescope: Automatic Visualization of Models and Array Data
Using Penzai and Treescope for Interpretability Research
Transformer Implementation
Utilities for Common Operations
Example: Finding Induction Heads In Gemma 7B
Discussion
Additional Penzai and Treescope Visualizations

Figures (8)

Figure 1: A partially-expanded Treescope rendering of a Transformer block from Penzai's implementation of the Gemma 7B model team2024gemma, showing the pz.nn.Attention combinator and some of the primitive sublayers it contains.
Figure 2: When pretty-printing a Penzai Linear layer, Treescope renders an inline faceted visualization of the parameter array.
Figure 3: A modified Transformer block, where the feed-forward layer has been replaced with a LinearizeAndAdjust combinator (which computes a linear approximation of its target layer) and a RewireComputationPaths operation (which copies activations across a named "worlds" batch axis).
Figure 4: A visualization of a rank-3 array of logit differences using Treescope (from the intervention in \ref{['fig:rewiring']}), with a mouse tooltip giving more information about a specific array element.
Figure 5: The Gemma 7B open-weights model team2024gemma, loaded using Penzai's transformer implementation and visualized using Treescope. The mouse cursor is hovering over a "copy path" button, which copies the location of the selected object to the clipboard when clicked.
...and 3 more figures

Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data

TL;DR

Abstract

Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data

Authors

TL;DR

Abstract

Table of Contents

Figures (8)