The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models
Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, Ann Yuan
TL;DR
Addressing the need to interpret NLP models beyond accuracy, the paper introduces LIT, a browser-based platform that unifies local explanations, aggregate analyses, and counterfactual generation. The approach centers on a modular, framework-agnostic, spec-driven architecture that supports classification, seq2seq, and structured prediction tasks. Its contributions include a rich set of visualizations (salience maps, attention, embeddings, metrics) and on-the-fly counterfactual generation, demonstrated through sentiment, coreference bias, and generation-debugging case studies. The work emphasizes usability and extensibility, with open-source access and a roadmap for new plug-ins and sequence/structured outputs.
Abstract
We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models. We focus on core questions about model behavior: Why did my model make this prediction? When does it perform poorly? What happens under a controlled change in the input? LIT integrates local explanations, aggregate analysis, and counterfactual generation into a streamlined, browser-based interface to enable rapid exploration and error analysis. We include case studies for a diverse set of workflows, including exploring counterfactuals for sentiment analysis, measuring gender bias in coreference systems, and exploring local behavior in text generation. LIT supports a wide range of models--including classification, seq2seq, and structured prediction--and is highly extensible through a declarative, framework-agnostic API. LIT is under active development, with code and full documentation available at https://github.com/pair-code/lit.
