Table of Contents
Fetching ...

exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models

Benjamin Hoover, Hendrik Strobelt, Sebastian Gehrmann

TL;DR

The paper addresses the interpretability challenge of learned representations in Transformer models by introducing exBERT, an interactive visualization tool that couples attention analysis with token-embedding context via nearest-neighbor search in an annotated corpus. It combines an attention view, corpus view, and metadata summaries to give intuitive, human-centered insights into what attention-heads and embeddings encode. A case study on BERT using the Wizard of Oz corpus demonstrates how linguistic features emerge across layers and how multiple heads collaborate to capture dependencies. The work provides open-source code and a demo, enabling rapid experimentation and deeper understanding of learned representations in Transformers.

Abstract

Large language models can produce powerful contextual representations that lead to improvements across many NLP tasks. Since these models are typically guided by a sequence of learned self attention mechanisms and may comprise undesired inductive biases, it is paramount to be able to explore what the attention has learned. While static analyses of these models lead to targeted insights, interactive tools are more dynamic and can help humans better gain an intuition for the model-internal reasoning process. We present exBERT, an interactive tool named after the popular BERT language model, that provides insights into the meaning of the contextual representations by matching a human-specified input to similar contexts in a large annotated dataset. By aggregating the annotations of the matching similar contexts, exBERT helps intuitively explain what each attention-head has learned.

exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models

TL;DR

The paper addresses the interpretability challenge of learned representations in Transformer models by introducing exBERT, an interactive visualization tool that couples attention analysis with token-embedding context via nearest-neighbor search in an annotated corpus. It combines an attention view, corpus view, and metadata summaries to give intuitive, human-centered insights into what attention-heads and embeddings encode. A case study on BERT using the Wizard of Oz corpus demonstrates how linguistic features emerge across layers and how multiple heads collaborate to capture dependencies. The work provides open-source code and a demo, enabling rapid experimentation and deeper understanding of learned representations in Transformers.

Abstract

Large language models can produce powerful contextual representations that lead to improvements across many NLP tasks. Since these models are typically guided by a sequence of learned self attention mechanisms and may comprise undesired inductive biases, it is paramount to be able to explore what the attention has learned. While static analyses of these models lead to targeted insights, interactive tools are more dynamic and can help humans better gain an intuition for the model-internal reasoning process. We present exBERT, an interactive tool named after the popular BERT language model, that provides insights into the meaning of the contextual representations by matching a human-specified input to similar contexts in a large annotated dataset. By aggregating the annotations of the matching similar contexts, exBERT helps intuitively explain what each attention-head has learned.

Paper Structure

This paper contains 14 sections, 2 equations, 4 figures.

Figures (4)

  • Figure 1: An overview of the different components of the tool. The token "escape" is selected and masked at 0-[all]. The results from a corpus search by token embedding are shown and summarized in (d-g). Users can enter a sentence in (a) and modify the attention view through selections in (b). Self attention is displayed in (c). The blue matrices show the attention of a head (column) to a token (row). Tokens and heads that are selected in (c) can be searched over the annotated corpus (shown: Wizard of Oz) with results presented in (d). Every token in (d) displays its linguistic metadata on hover. A colored summary of the matched token (black highlight) and its context is shown in (e), which can be expanded or collapsed with the buttons above it. The histograms in (f) and (g) summarize the metadata of the results in (d) for the matched token and the token of max attention, respectively.
  • Figure 2: Left: searching by token embedding results. Histogram summaries shown at layers 5 (a) and 6 (b). Right: histogram summaries of searching by different head selections at layer 5.
  • Figure 3: Exploration of positional heads, inspecting the positional head 2-0. The overview in (a) shows the head behavior always pointing to the following word, and the search token "of" is highlighted in (b). The matched results are summarized by POS and offset in (c) and by DEP in (d).
  • Figure 4: Token embeddings for the "escape" token setup in \ref{['sec:btm']} across every layer. The matched tokens at the output of the model are shown in the bottom corpus inspector view, whereas summaries are shown for all the other layers