Table of Contents
Fetching ...

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, Ann Yuan

TL;DR

Addressing the need to interpret NLP models beyond accuracy, the paper introduces LIT, a browser-based platform that unifies local explanations, aggregate analyses, and counterfactual generation. The approach centers on a modular, framework-agnostic, spec-driven architecture that supports classification, seq2seq, and structured prediction tasks. Its contributions include a rich set of visualizations (salience maps, attention, embeddings, metrics) and on-the-fly counterfactual generation, demonstrated through sentiment, coreference bias, and generation-debugging case studies. The work emphasizes usability and extensibility, with open-source access and a roadmap for new plug-ins and sequence/structured outputs.

Abstract

We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models. We focus on core questions about model behavior: Why did my model make this prediction? When does it perform poorly? What happens under a controlled change in the input? LIT integrates local explanations, aggregate analysis, and counterfactual generation into a streamlined, browser-based interface to enable rapid exploration and error analysis. We include case studies for a diverse set of workflows, including exploring counterfactuals for sentiment analysis, measuring gender bias in coreference systems, and exploring local behavior in text generation. LIT supports a wide range of models--including classification, seq2seq, and structured prediction--and is highly extensible through a declarative, framework-agnostic API. LIT is under active development, with code and full documentation available at https://github.com/pair-code/lit.

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

TL;DR

Addressing the need to interpret NLP models beyond accuracy, the paper introduces LIT, a browser-based platform that unifies local explanations, aggregate analyses, and counterfactual generation. The approach centers on a modular, framework-agnostic, spec-driven architecture that supports classification, seq2seq, and structured prediction tasks. Its contributions include a rich set of visualizations (salience maps, attention, embeddings, metrics) and on-the-fly counterfactual generation, demonstrated through sentiment, coreference bias, and generation-debugging case studies. The work emphasizes usability and extensibility, with open-source access and a roadmap for new plug-ins and sequence/structured outputs.

Abstract

We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models. We focus on core questions about model behavior: Why did my model make this prediction? When does it perform poorly? What happens under a controlled change in the input? LIT integrates local explanations, aggregate analysis, and counterfactual generation into a streamlined, browser-based interface to enable rapid exploration and error analysis. We include case studies for a diverse set of workflows, including exploring counterfactuals for sentiment analysis, measuring gender bias in coreference systems, and exploring local behavior in text generation. LIT supports a wide range of models--including classification, seq2seq, and structured prediction--and is highly extensible through a declarative, framework-agnostic API. LIT is under active development, with code and full documentation available at https://github.com/pair-code/lit.

Paper Structure

This paper contains 21 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: The LIT UI, showing a fine-tuned BERT DevlinBERT model on the Stanford Sentiment Treebank socher2013sst development set. The top half shows a selection toolbar, and, left-to-right: the embedding projector, the data table, and the datapoint editor. Tabs present different modules in the bottom half; the view above shows classifier predictions, an attention visualization, and a confusion matrix.
  • Figure 2: Salience maps on "It's not the ultimate depression-era gangster movie.", suggesting that "not" and "ultimate" are important to the model's prediction.
  • Figure 3: Exploring a coreference model on the Winogender dataset.
  • Figure 4: Investigating a local generation error, from selection of an interesting example to finding relevant training datapoints that led to an error.
  • Figure A.1: The counterfactual generator module, showing a set of generated datapoints in the staging area. Labels can be maually edited before adding these to the dataset. In this example, the counterfactuals were created using the word replacer, replacing the word "great" with "terrible" across the dataset.
  • ...and 4 more figures