Table of Contents
Fetching ...

ICX360: In-Context eXplainability 360 Toolkit

Dennis Wei, Ronny Luss, Xiaomeng Hu, Lucas Monteiro Paes, Pin-Yu Chen, Karthikeyan Natesan Ramamurthy, Erik Miehling, Inge Vejsbjerg, Hendrik Strobelt

TL;DR

ICX360 addresses the need for in-context explainability of LLM outputs grounded in user-provided context. It introduces three methods—MExGen, CELL, and Token Highlighter—that span black-box and white-box attributions and include contrastive explanations, integrated within a unified implementation framework. The toolkit modularizes explainers, model wrappers, perturbation infillers, scalarizers, segmenters, and evaluation metrics, enabling scalable, context-aware explanations for generation tasks. This work advances practical explainability for high-stakes and everyday LLM applications, offering a flexible platform for retrieval-augmented generation, natural language generation, and related tasks while highlighting areas for future enhancement.

Abstract

Large Language Models (LLMs) have become ubiquitous in everyday life and are entering higher-stakes applications ranging from summarizing meeting transcripts to answering doctors' questions. As was the case with earlier predictive models, it is crucial that we develop tools for explaining the output of LLMs, be it a summary, list, response to a question, etc. With these needs in mind, we introduce In-Context Explainability 360 (ICX360), an open-source Python toolkit for explaining LLMs with a focus on the user-provided context (or prompts in general) that are fed to the LLMs. ICX360 contains implementations for three recent tools that explain LLMs using both black-box and white-box methods (via perturbations and gradients respectively). The toolkit, available at https://github.com/IBM/ICX360, contains quick-start guidance materials as well as detailed tutorials covering use cases such as retrieval augmented generation, natural language generation, and jailbreaking.

ICX360: In-Context eXplainability 360 Toolkit

TL;DR

ICX360 addresses the need for in-context explainability of LLM outputs grounded in user-provided context. It introduces three methods—MExGen, CELL, and Token Highlighter—that span black-box and white-box attributions and include contrastive explanations, integrated within a unified implementation framework. The toolkit modularizes explainers, model wrappers, perturbation infillers, scalarizers, segmenters, and evaluation metrics, enabling scalable, context-aware explanations for generation tasks. This work advances practical explainability for high-stakes and everyday LLM applications, offering a flexible platform for retrieval-augmented generation, natural language generation, and related tasks while highlighting areas for future enhancement.

Abstract

Large Language Models (LLMs) have become ubiquitous in everyday life and are entering higher-stakes applications ranging from summarizing meeting transcripts to answering doctors' questions. As was the case with earlier predictive models, it is crucial that we develop tools for explaining the output of LLMs, be it a summary, list, response to a question, etc. With these needs in mind, we introduce In-Context Explainability 360 (ICX360), an open-source Python toolkit for explaining LLMs with a focus on the user-provided context (or prompts in general) that are fed to the LLMs. ICX360 contains implementations for three recent tools that explain LLMs using both black-box and white-box methods (via perturbations and gradients respectively). The toolkit, available at https://github.com/IBM/ICX360, contains quick-start guidance materials as well as detailed tutorials covering use cases such as retrieval augmented generation, natural language generation, and jailbreaking.

Paper Structure

This paper contains 19 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: A two-dimensional view of the space of in-context explanations, with level of access to the LLM on the horizontal axis and input granularity on the vertical axis. Current methods in ICX360 are situated within this plane. The downward arrow for MExGen indicates that it can proceed "top-down" from coarser levels of granularity to finer ones, while Token Highlighter is "bottom-up". Not shown is a third dimension of output granularity; all three methods operate at the level of output phrases or sentences.
  • Figure 2: Token Highlighter code snippet
  • Figure 3: MExGen code snippet
  • Figure 4: CELL code snippet