Table of Contents
Fetching ...

Transformer Explainer: Interactive Learning of Text-Generative Models

Aeree Cho, Grace C. Kim, Alexander Karpekov, Alec Helbling, Zijie J. Wang, Seongmin Lee, Benjamin Hoover, Duen Horng Chau

TL;DR

The paper tackles the difficulty of understanding Transformers for non-experts by introducing Transformer Explainer, an open-source, browser-based visualization tool built around GPT-2. It integrates a model overview and multi-level abstractions to show how input text flows through embedding, Transformer blocks, attention, and token prediction, with a live in-browser GPT-2 inference engine. Its contributions include a Sankey diagram driven visualization of data flow, real-time interaction with the temperature parameter to explore predictive determinism, and seamless transitions between high-level and low-level explanations. The work broadens access to modern generative AI concepts without special hardware or software setup and provides an educational platform for hands-on learning.

Abstract

Transformers have revolutionized machine learning, yet their inner workings remain opaque to many. We present Transformer Explainer, an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model. Our tool helps users understand complex Transformer concepts by integrating a model overview and enabling smooth transitions across abstraction levels of mathematical operations and model structures. It runs a live GPT-2 instance locally in the user's browser, empowering users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together to predict the next tokens. Our tool requires no installation or special hardware, broadening the public's education access to modern generative AI techniques. Our open-sourced tool is available at https://poloclub.github.io/transformer-explainer/. A video demo is available at https://youtu.be/ECR4oAwocjs.

Transformer Explainer: Interactive Learning of Text-Generative Models

TL;DR

The paper tackles the difficulty of understanding Transformers for non-experts by introducing Transformer Explainer, an open-source, browser-based visualization tool built around GPT-2. It integrates a model overview and multi-level abstractions to show how input text flows through embedding, Transformer blocks, attention, and token prediction, with a live in-browser GPT-2 inference engine. Its contributions include a Sankey diagram driven visualization of data flow, real-time interaction with the temperature parameter to explore predictive determinism, and seamless transitions between high-level and low-level explanations. The work broadens access to modern generative AI concepts without special hardware or software setup and provides an educational platform for hands-on learning.

Abstract

Transformers have revolutionized machine learning, yet their inner workings remain opaque to many. We present Transformer Explainer, an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model. Our tool helps users understand complex Transformer concepts by integrating a model overview and enabling smooth transitions across abstraction levels of mathematical operations and model structures. It runs a live GPT-2 instance locally in the user's browser, empowering users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together to predict the next tokens. Our tool requires no installation or special hardware, broadening the public's education access to modern generative AI techniques. Our open-sourced tool is available at https://poloclub.github.io/transformer-explainer/. A video demo is available at https://youtu.be/ECR4oAwocjs.
Paper Structure (3 sections, 1 figure)

This paper contains 3 sections, 1 figure.

Figures (1)

  • Figure 1: The temperature slider lets users interactively experiment with the temperature parameter's impact on the next token's probability distribution. Left: lower temperatures sharpen the distribution, making outputs more predictable. Right: higher temperatures smooth the distribution, resulting in less predictable outputs.