Table of Contents
Fetching ...

generAItor: Tree-in-the-Loop Text Generation for Language Model Explainability and Adaptation

Thilo Spinner, Rebecca Kehlbeck, Rita Sevastjanova, Tobias Stähle, Daniel A. Keim, Oliver Deussen, Mennatallah El-Assady

TL;DR

The paper tackles the lack of explainability, comparability, and adaptability in large language models by introducing generAItor, a visual analytics framework that centers on a beam search tree (BST) and augments it with task-specific widgets. This tree-in-the-loop paradigm enables users to generate, explore, compare, and adapt model outputs through interactive visualizations such as keyword coloring, sentiment shading, and an ontology-driven Voronoi treemap, while supporting model prompting, fine-tuning, and comparative analyses across prompts. The authors provide a web-based implementation and validate it through a case study on gender bias, two qualitative user studies with non-experts and linguists, and a quantitative evaluation of adaptation with few-shot data, showing meaningful gains in bias analysis, usability, and domain adaptation. The approach demonstrates practical impact by enabling non-technical users and linguistic experts to reason about model behavior, uncover biases, and iteratively steer models toward user intents, with potential for transfer to existing interfaces and state-of-the-art models.

Abstract

Large language models (LLMs) are widely deployed in various downstream tasks, e.g., auto-completion, aided writing, or chat-based text generation. However, the considered output candidates of the underlying search algorithm are under-explored and under-explained. We tackle this shortcoming by proposing a tree-in-the-loop approach, where a visual representation of the beam search tree is the central component for analyzing, explaining, and adapting the generated outputs. To support these tasks, we present generAItor, a visual analytics technique, augmenting the central beam search tree with various task-specific widgets, providing targeted visualizations and interaction possibilities. Our approach allows interactions on multiple levels and offers an iterative pipeline that encompasses generating, exploring, and comparing output candidates, as well as fine-tuning the model based on adapted data. Our case study shows that our tool generates new insights in gender bias analysis beyond state-of-the-art template-based methods. Additionally, we demonstrate the applicability of our approach in a qualitative user study. Finally, we quantitatively evaluate the adaptability of the model to few samples, as occurring in text-generation use cases.

generAItor: Tree-in-the-Loop Text Generation for Language Model Explainability and Adaptation

TL;DR

The paper tackles the lack of explainability, comparability, and adaptability in large language models by introducing generAItor, a visual analytics framework that centers on a beam search tree (BST) and augments it with task-specific widgets. This tree-in-the-loop paradigm enables users to generate, explore, compare, and adapt model outputs through interactive visualizations such as keyword coloring, sentiment shading, and an ontology-driven Voronoi treemap, while supporting model prompting, fine-tuning, and comparative analyses across prompts. The authors provide a web-based implementation and validate it through a case study on gender bias, two qualitative user studies with non-experts and linguists, and a quantitative evaluation of adaptation with few-shot data, showing meaningful gains in bias analysis, usability, and domain adaptation. The approach demonstrates practical impact by enabling non-technical users and linguistic experts to reason about model behavior, uncover biases, and iteratively steer models toward user intents, with potential for transfer to existing interfaces and state-of-the-art models.

Abstract

Large language models (LLMs) are widely deployed in various downstream tasks, e.g., auto-completion, aided writing, or chat-based text generation. However, the considered output candidates of the underlying search algorithm are under-explored and under-explained. We tackle this shortcoming by proposing a tree-in-the-loop approach, where a visual representation of the beam search tree is the central component for analyzing, explaining, and adapting the generated outputs. To support these tasks, we present generAItor, a visual analytics technique, augmenting the central beam search tree with various task-specific widgets, providing targeted visualizations and interaction possibilities. Our approach allows interactions on multiple levels and offers an iterative pipeline that encompasses generating, exploring, and comparing output candidates, as well as fine-tuning the model based on adapted data. Our case study shows that our tool generates new insights in gender bias analysis beyond state-of-the-art template-based methods. Additionally, we demonstrate the applicability of our approach in a qualitative user study. Finally, we quantitatively evaluate the adaptability of the model to few samples, as occurring in text-generation use cases.
Paper Structure (84 sections, 17 figures, 2 tables)

This paper contains 84 sections, 17 figures, 2 tables.

Figures (17)

  • Figure 1: The beam search tree visualization. Edge width and --label encode the probability of a node to follow its predecessor. The leaf node of the beam with the highest overall probability is marked as Head. Keywords are highlighted using semantic colors. The branch color encodes the sentiment of the sequence up to a node.
  • Figure 2: The five main tasks of interactive text generation as supported by generAItor (see \ref{['subsec:tasks']}). The beam search tree is the key element (see \ref{['sec:beam-search-tree']}), facilitating visualization and interaction with the model's decisions. Each task has a set of widgets associated (see \ref{['sec:task-specific-widgets']}), providing task-specific visualizations, controls, and interaction possibilities. Following our proposed tree-in-the-loop paradigm, the tasks are interwoven and can be combined in an iterative process, centered around the beam search tree.
  • Figure 3: Text generation workflow as described in \ref{['subsubsec:workflow-demo-generation']}. (1) After creating a new tree and predicting with the set parameters, the model runs into a loop. By choosing a different branch, this issue can be resolved. (2) By manually editing nodes, factual knowledge can be incorporated into the text. (3) The ontology tree gives an overview of concepts connected to the generated text; (4) ontological replacements suggest alternatives.
  • Figure 4: The generAItor workspace in comparative analysis mode, with the associated widgets opened. The tree visualization as the central element shows alternative beam search results under different replacements of the <PH> node. Words occurring in one of the selected word lists are highlighted in the tree. The Upset plot shows the overlap of the selected word lists in the alternative trees. The edges of the tree are colored based on sentiment analysis, with red indicating negative sentiment and green indicating positive sentiment.
  • Figure 5: The prompt "<PH> is great. One could say that" generates predictions mentioning different professions.
  • ...and 12 more figures