Supporting Sensemaking of Large Language Model Outputs at Scale
Katy Ilonka Gero, Chelse Swoopes, Ziwei Gu, Jonathan K. Kummerfeld, Elena L. Glassman
TL;DR
This work tackles how to make sense of many LLM outputs at meso-scale by introducing five text-analysis and rendering features (including a novel Positional Diction Clustering). It grounds design in Variation Theory and Analogical Learning Theory, and validates the approach through a controlled study (n=24) plus eight case studies, showing that the features support diverse sensemaking tasks and reveal insights unreachable by traditional single-output UIs. Key contributions include a practical interface with exact matches, unique words, PDC-based grids and interleaved renderings, plus explicit design guidelines for future LLM inspectors. The findings suggest that preserving full-text outputs, avoiding predefined lenses, and pre-computing cross-document relationships enable richer, more scalable analysis of LLM-generated content, with broad implications for end-user workflows, model auditing, and prompt engineering.
Abstract
Large language models (LLMs) are capable of generating multiple responses to a single prompt, yet little effort has been expended to help end-users or system designers make use of this capability. In this paper, we explore how to present many LLM responses at once. We design five features, which include both pre-existing and novel methods for computing similarities and differences across textual documents, as well as how to render their outputs. We report on a controlled user study (n=24) and eight case studies evaluating these features and how they support users in different tasks. We find that the features support a wide variety of sensemaking tasks and even make tasks previously considered to be too difficult by our participants now tractable. Finally, we present design guidelines to inform future explorations of new LLM interfaces.
