GLUScope: A Tool for Analyzing GLU Neurons in Transformer Language Models

Sebastian Gerstner; Hinrich Schütze

GLUScope: A Tool for Analyzing GLU Neurons in Transformer Language Models

Sebastian Gerstner, Hinrich Schütze

TL;DR

GLUScope, an open-source tool for analyzing neurons in Transformer-based language models, intended for interpretability researchers, focuses on more recent models than previous tools do, and considers gated activation functions such as SwiGLU.

Abstract

We present GLUScope, an open-source tool for analyzing neurons in Transformer-based language models, intended for interpretability researchers. We focus on more recent models than previous tools do; specifically we consider gated activation functions such as SwiGLU. This introduces a new challenge: understanding positive activations is not enough. Instead, both the gate and the in activation of a neuron can be positive or negative, leading to four different possible sign combinations that in some cases have quite different functionalities. Accordingly, for any neuron, our tool shows text examples for each of the four sign combinations, and indicates how often each combination occurs. We describe examples of how our tool can lead to novel insights. A demo is available at https: //sjgerstner.github.io/gluscope.

GLUScope: A Tool for Analyzing GLU Neurons in Transformer Language Models

TL;DR

Abstract

Paper Structure (22 sections, 4 equations, 4 figures)

This paper contains 22 sections, 4 equations, 4 figures.

Introduction
Preliminaries
Gated activation functions (GLU variants)
Weight preprocessing
Related work
Neuron analysis research
Neuron analysis tools
Related code libraries
Released artifacts
Dolma subset
Activation dataset
Our demo website: GLUScope
Summary statistics
Text examples
Usage examples
...and 7 more sections

Figures (4)

Figure 1: Overview of a neuron page (neuron 31.9634 of OLMo-7B-0424).
Figure 2: Within each sign combination we show examples for the four intermediate activations "hook_post", "hook_pre_linear", "hook_pre" and "swish". See \ref{['sec:summary-stats']} for definitions.
Figure 3: A single example of a strong activation of neuron 31.9634 in the case $x_{\text{gate}}<0$, $x_{\text{in}}<0$. The colored tokens are those where this sign combination occurred for this neuron. Via the top bar it is possible to display all intermediate activations as well. Hovering over a token shows the exact activation values.
Figure 4: Code for \ref{['sec:dataset-example']}

GLUScope: A Tool for Analyzing GLU Neurons in Transformer Language Models

TL;DR

Abstract

GLUScope: A Tool for Analyzing GLU Neurons in Transformer Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)