generAItor: Tree-in-the-Loop Text Generation for Language Model Explainability and Adaptation
Thilo Spinner, Rebecca Kehlbeck, Rita Sevastjanova, Tobias Stähle, Daniel A. Keim, Oliver Deussen, Mennatallah El-Assady
TL;DR
The paper tackles the lack of explainability, comparability, and adaptability in large language models by introducing generAItor, a visual analytics framework that centers on a beam search tree (BST) and augments it with task-specific widgets. This tree-in-the-loop paradigm enables users to generate, explore, compare, and adapt model outputs through interactive visualizations such as keyword coloring, sentiment shading, and an ontology-driven Voronoi treemap, while supporting model prompting, fine-tuning, and comparative analyses across prompts. The authors provide a web-based implementation and validate it through a case study on gender bias, two qualitative user studies with non-experts and linguists, and a quantitative evaluation of adaptation with few-shot data, showing meaningful gains in bias analysis, usability, and domain adaptation. The approach demonstrates practical impact by enabling non-technical users and linguistic experts to reason about model behavior, uncover biases, and iteratively steer models toward user intents, with potential for transfer to existing interfaces and state-of-the-art models.
Abstract
Large language models (LLMs) are widely deployed in various downstream tasks, e.g., auto-completion, aided writing, or chat-based text generation. However, the considered output candidates of the underlying search algorithm are under-explored and under-explained. We tackle this shortcoming by proposing a tree-in-the-loop approach, where a visual representation of the beam search tree is the central component for analyzing, explaining, and adapting the generated outputs. To support these tasks, we present generAItor, a visual analytics technique, augmenting the central beam search tree with various task-specific widgets, providing targeted visualizations and interaction possibilities. Our approach allows interactions on multiple levels and offers an iterative pipeline that encompasses generating, exploring, and comparing output candidates, as well as fine-tuning the model based on adapted data. Our case study shows that our tool generates new insights in gender bias analysis beyond state-of-the-art template-based methods. Additionally, we demonstrate the applicability of our approach in a qualitative user study. Finally, we quantitatively evaluate the adaptability of the model to few samples, as occurring in text-generation use cases.
