Table of Contents
Fetching ...

VisPile: A Visual Analytics System for Analyzing Multiple Text Documents With Large Language Models and Knowledge Graphs

Adam Coscia, Alex Endert

TL;DR

VisPile addresses the challenge of sensemaking over large text corpora by integrating large language models and knowledge graphs into a visual analytics workflow. The system supports open-ended document retrieval, pile-based grouping, and evidence validation by linking LLM outputs with KG facts, enabling analysts to compare generated and ground-truth information. The authors present design goals, an open-source implementation, and formative domain expert feedback from six intelligence professionals, highlighting benefits of LLM-KG synergy and the importance of provenance. Limitations include reliance on a single LLM and a KRONOS dataset; future work includes exploring longer documents, diverse LLMs, and improved KG alignment and trust mechanisms. The work demonstrates a practical path toward faster sensemaking with AI-augmented visual analytics in intelligence contexts.

Abstract

Intelligence analysts perform sensemaking over collections of documents using various visual and analytic techniques to gain insights from large amounts of text. As data scales grow, our work explores how to leverage two AI technologies, large language models (LLMs) and knowledge graphs (KGs), in a visual text analysis tool, enhancing sensemaking and helping analysts keep pace. Collaborating with intelligence community experts, we developed a visual analytics system called VisPile. VisPile integrates an LLM and a KG into various UI functions that assist analysts in grouping documents into piles, performing sensemaking tasks like summarization and relationship mapping on piles, and validating LLM- and KG-generated evidence. Our paper describes the tool, as well as feedback received from six professional intelligence analysts that used VisPile to analyze a text document corpus.

VisPile: A Visual Analytics System for Analyzing Multiple Text Documents With Large Language Models and Knowledge Graphs

TL;DR

VisPile addresses the challenge of sensemaking over large text corpora by integrating large language models and knowledge graphs into a visual analytics workflow. The system supports open-ended document retrieval, pile-based grouping, and evidence validation by linking LLM outputs with KG facts, enabling analysts to compare generated and ground-truth information. The authors present design goals, an open-source implementation, and formative domain expert feedback from six intelligence professionals, highlighting benefits of LLM-KG synergy and the importance of provenance. Limitations include reliance on a single LLM and a KRONOS dataset; future work includes exploring longer documents, diverse LLMs, and improved KG alignment and trust mechanisms. The work demonstrates a practical path toward faster sensemaking with AI-augmented visual analytics in intelligence contexts.

Abstract

Intelligence analysts perform sensemaking over collections of documents using various visual and analytic techniques to gain insights from large amounts of text. As data scales grow, our work explores how to leverage two AI technologies, large language models (LLMs) and knowledge graphs (KGs), in a visual text analysis tool, enhancing sensemaking and helping analysts keep pace. Collaborating with intelligence community experts, we developed a visual analytics system called VisPile. VisPile integrates an LLM and a KG into various UI functions that assist analysts in grouping documents into piles, performing sensemaking tasks like summarization and relationship mapping on piles, and validating LLM- and KG-generated evidence. Our paper describes the tool, as well as feedback received from six professional intelligence analysts that used VisPile to analyze a text document corpus.

Paper Structure

This paper contains 11 sections, 4 figures.

Figures (4)

  • Figure 1: The data architecture for VisPile.
  • Figure 2: The VisPile interface. VisPile leverages an LLM and a KG to help analysts gather evidence from document collections. The general user workflow is: (1) Search for documents to pile, using keywords, metadata, KG entity search, and/or LLM semantic search; (2) drag documents into piles; (3) choose an LLM and pre-generated task to run as an LLM prompt; (4) Run the LLM task and read the response; (5) Repeat the process, rearranging and renaming piles to scaffold the sensemaking process.
  • Figure 3: Piles (A) allow users to group documents together and run any of nine sensemaking tasks on the documents as LLM prompts. Prompts are shown for transparency and can be adjusted with inputs for questions, entity types, and concepts, as well as a temperature slider. The KG fact list (B) shows up to 5 top-ranked facts, or plain text triples in the form subject$\to$object$\to$predicate, based on the LLM response in the pile.
  • Figure 4: Extract (A), Link (B), and Suggest (C) are buttons in piles that help analysts verify and contextualize evidence from the LLM and KG.