Steering LLM Summarization with Visual Workspaces for Sensemaking
Xuxin Tang, Eric Krokos, Can Liu, Kylie Davidson, Kirsten Whitley, Naren Ramakrishnan, Chris North
TL;DR
This work introduces space-steered summarization, a framework that uses a human-created visual workspace as an intermediate step to guide LLM-based sensemaking. By transforming workspace-derived text-level, insight-level, structure-level, and connection information into prompts, the method externalizes and structures human reasoning to steer summarization. Across proof-of-concept experiments with GPT-4o and a ground-truth dataset, workspace-enabled prompts substantially improve alignment with the ground truth, outperforming prompts that contain only documents. The findings highlight the potential of visual workspaces to accelerate human-AI collaboration in complex, multi-document sensemaking tasks and pave the way for interactive, mixed-initiative summarization systems with external memory for LLMs.
Abstract
Large Language Models (LLMs) have been widely applied in summarization due to their speedy and high-quality text generation. Summarization for sensemaking involves information compression and insight extraction. Human guidance in sensemaking tasks can prioritize and cluster relevant information for LLMs. However, users must translate their cognitive thinking into natural language to communicate with LLMs. Can we use more readable and operable visual representations to guide the summarization process for sensemaking? Therefore, we propose introducing an intermediate step--a schematic visual workspace for human sensemaking--before the LLM generation to steer and refine the summarization process. We conduct a series of proof-of-concept experiments to investigate the potential for enhancing the summarization by GPT-4 through visual workspaces. Leveraging a textual sensemaking dataset with a ground truth summary, we evaluate the impact of a human-generated visual workspace on LLM-generated summarization of the dataset and assess the effectiveness of space-steered summarization. We categorize several types of extractable information from typical human workspaces that can be injected into engineered prompts to steer the LLM summarization. The results demonstrate how such workspaces can help align an LLM with the ground truth, leading to more accurate summarization results than without the workspaces.
