Table of Contents
Fetching ...

Empirical Insights into Analytic Provenance Summarization: A Study on Segmenting Data Analysis Workflows

Shaghayegh Esmaeili, Irelis D. Suarez, Ezekiel Ajayi, Eric D. Ragan

TL;DR

An empirical study explores how users naturally present, communicate, and summarize visual data analysis activities, and uncovers key patterns and high-level categories that inform users' decisions when segmenting analytic workflows.

Abstract

The complexity of exploratory data analysis poses significant challenges for collaboration and effective communication of analytic workflows. Automated methods can alleviate these challenges by summarizing workflows into more interpretable segments, but designing effective provenance-summarization algorithms depends on understanding the factors that guide how humans segment their analysis. To address this, we conducted an empirical study that explores how users naturally present, communicate, and summarize visual data analysis activities. Our qualitative analysis uncovers key patterns and high-level categories that inform users' decisions when segmenting analytic workflows, revealing the nuanced interplay between data-driven actions and strategic thinking. These insights provide a robust empirical foundation for algorithm development and highlight critical factors that must be considered to enhance the design of visual analytics tools. By grounding algorithmic decisions in human behavior, our findings offer valuable contributions to developing more intuitive and practical tools for automated summarization and clear presentation of analytic provenance.

Empirical Insights into Analytic Provenance Summarization: A Study on Segmenting Data Analysis Workflows

TL;DR

An empirical study explores how users naturally present, communicate, and summarize visual data analysis activities, and uncovers key patterns and high-level categories that inform users' decisions when segmenting analytic workflows.

Abstract

The complexity of exploratory data analysis poses significant challenges for collaboration and effective communication of analytic workflows. Automated methods can alleviate these challenges by summarizing workflows into more interpretable segments, but designing effective provenance-summarization algorithms depends on understanding the factors that guide how humans segment their analysis. To address this, we conducted an empirical study that explores how users naturally present, communicate, and summarize visual data analysis activities. Our qualitative analysis uncovers key patterns and high-level categories that inform users' decisions when segmenting analytic workflows, revealing the nuanced interplay between data-driven actions and strategic thinking. These insights provide a robust empirical foundation for algorithm development and highlight critical factors that must be considered to enhance the design of visual analytics tools. By grounding algorithmic decisions in human behavior, our findings offer valuable contributions to developing more intuitive and practical tools for automated summarization and clear presentation of analytic provenance.

Paper Structure

This paper contains 28 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overview of the main steps of this research are designated by the orange bracket within the context of the broader research goals: (a) Prior user studies for sample provenance data. (b) Sample provenance records were cleaned and aggregated to create a provenance data repository, including user interaction logs and think-aloud comments (section \ref{['data-sample']}). (c) A new user study of summarization of analytic workflows makes the core of the research in this paper. (d) We collected participants' summaries as note cards as the main data. (e) We qualitatively analyzed the collected data to extract patterns, themes, and features related to the summarization rationale (section \ref{['analysis']}). (f) Our research findings can inform the development and tuning of automated summarization algorithms. (g) This research will enable improved human-understandable summaries of analytic provenance for communication and presentation. The paper's focus is on steps (c), (d), and (e), and it provides fundamental results and themes for step (f).
  • Figure 2: The setting in which provenance data was collected for this paper's study (i.e., summarization user study). Left: Partial display of the visual provenance tool used for the data analysis scenario. Analysts are able to use it for reading documents, highlighting, searching, etc. The events and the analyst's actions would be recorded in the system background. Right: Prior study where an analyst would use the tool for a data analysis scenario. We used the recorded logs as sample provenance data in our summarization study.
  • Figure 3: Overview of the main steps of our qualitative data analysis. The arrow between steps 3 and 4 indicates the iterative cycle between coding data and discussing them to reach an agreement. This cycle ends when no further change is needed for either the coding scheme or coded data.
  • Figure 4: This histogram shows the total number of cards (per participant). The blue dashed line shows the mean (8.7 cards). Across all participants, there were 157 summary cards.
  • Figure 5: This histogram shows the duration of cards across all participants. The red dashed line shows the mean (8.08 minutes). The min and max are 0.06 and 30.28 minutes.
  • ...and 1 more figures