From Data Dump to Digestible Chunks: Automated Segmentation and Summarization of Provenance Logs for Communication

Jeremy E. Block; Donald Honeycutt; Brett Benda; Benjamin Rheault; Eric D. Ragan

From Data Dump to Digestible Chunks: Automated Segmentation and Summarization of Provenance Logs for Communication

Jeremy E. Block, Donald Honeycutt, Brett Benda, Benjamin Rheault, Eric D. Ragan

TL;DR

Analytic sensemaking in complex analyses poses challenges for communicating process and progress across collaborators. The authors propose an automatic segmentation and summarization pipeline that breaks interaction provenance into temporally coherent segments and generates textual cards to facilitate handoffs and meta-analysis, demonstrated across five intelligence-domain datasets. The system combines a two-part visual interface (overview + card view) with a four-step processing pipeline (vectorization, alignment, segmentation, summarization) and is evaluated through expert reviews, yielding insights into audience-specific needs and design refinements. The work advances provenance communication by emphasizing meaningful, hierarchical segmentation and narrative-friendly summaries, with implications for training, collaboration, and cross-domain adoption in domains where provenance is critical yet challenging to digest.

Abstract

Communicating one's sensemaking during a complex analysis session to explain thought processes is hard, yet most intelligence occurs in collaborative settings. Team members require a deeper understanding of the work being completed by their peers and subordinates, but little research has fully articulated best practices for analytic provenance consumers. This work proposes an automatic summarization technique that separates an analysis session and summarizes interaction provenance as textual blurbs to allow for meta-analysis of work done. Focusing on the domain of intelligence analysis, we demonstrate our segmentation technique using five datasets, including both publicly available and classified interaction logs. We shared our demonstration with a notoriously inaccessible population of expert reviewers with experience as United States Department of Defense analysts. Our findings indicate that the proposed pipeline effectively generates cards that display key events from interaction logs, facilitating the sharing of analysis progress. Yet, we also hear that there is a need for more prominent justifications and pattern elicitation controls to communicate analysis summaries more effectively. The expert review highlights the potential of automated approaches in addressing the challenges of provenance information in complex domains. We'd like to emphasize the need for further research into provenance communication in other domains. A free copy of this paper and all supplemental materials are available at https://osf.io/j4bxt

From Data Dump to Digestible Chunks: Automated Segmentation and Summarization of Provenance Logs for Communication

TL;DR

Abstract

Paper Structure (32 sections, 4 figures)

This paper contains 32 sections, 4 figures.

Introduction
Related work
Collaborative Visual Analysis
Analytic Provenance
Text Summarization
Design Overview
Design Rationale and Goals
System Design
Visual Interface
Session Summary Component
Card Summary View
Data Processing
Preparing Session Overview
Preprocessing for Segmentation
Preparing Summarization Cards
...and 17 more sections

Figures (4)

Figure 1: The interface we designed to help communicate analytic provenance information to new users. Notice that it contains an overarching summary (in yellow) and individual summaries (the set of cards) for segments of time to help explain the analysis process. In this paper, we present the visualization approach and demonstrate the automated technique on five different datasets.
Figure 2: A single card in list mode shows the information from a segment more completely, especially to aid debugging or for more detailed review or interaction provenance. The tooltips help show the various data more thoroughly. For example, we can see the list of searches completed in this segment when hovering over the list.
Figure 3: An abstraction of our segmentation and summarization methodology for interaction histories in analysis scenarios. Essentially, our technique uses A) topic models of the underlying dataset and B) the order of interactions with these documents to identify C) key breakpoints to segment an investigation along. D) We summarize each segment as textual cards to tell the story efficiently
Figure 4: A visual representation of the phases of the expert review used to evaluate the visualization and automatic summarization technique. A) Participants openly explored the interface with think-aloud. They then B) classified different analysis sessions before finally being C) asked questions to better understand their preferences.

From Data Dump to Digestible Chunks: Automated Segmentation and Summarization of Provenance Logs for Communication

TL;DR

Abstract

From Data Dump to Digestible Chunks: Automated Segmentation and Summarization of Provenance Logs for Communication

Authors

TL;DR

Abstract

Table of Contents

Figures (4)