A Tale of Two Models: Understanding Data Workers' Internal and External Representations of Complex Data

Connor Scully-Allison; Katy Williams; Stephanie Brink; Olga Pearce; Katherine E. Isaacs

A Tale of Two Models: Understanding Data Workers' Internal and External Representations of Complex Data

Connor Scully-Allison, Katy Williams, Stephanie Brink, Olga Pearce, Katherine E. Isaacs

TL;DR

The paper investigates how data workers mental models of heterogeneous data diverge from the reified data model embedded in a domain specific library EnsembleAPI. Through a qualitative study of ten participants using interviews, sketches, and task prompts analyzed via reflexive thematic analysis, the authors identify substantial diversity in mental representations and two parallel hazards that hinder analysis: incomplete mental models and misalignment with the reified model. They discuss implications for user centered design of data tools, including visual exploration techniques, graph based scripting, and improved metadata structure, to bridge gaps between theory and practice and reduce engineering debt. The work highlights the value of probing stakeholder mental models early and embracing multiple representations to support complex data analysis workflows in HPC domains.

Abstract

Data workers may have a a different mental model of their data that the one reified in code. Understanding the organization of their data is necessary for analyzing data, be it through scripting, visualization or abstract thought. More complicated organizations, such as tables with attached hierarchies, may tax people's ability to think about and interact with data. To better understand and ultimately design for these situations, we conduct a study across a team of ten people working with the same reified data model. Through interviews and sketching, we probed their conception of the data model and developed themes through reflexive data analysis. Participants had diverse data models that differed from the reified data model, even among team members who had designed the model, resulting in parallel hazards limiting their ability to reason about the data. From these observations, we suggest potential design interventions for data analysis processes and tools.

A Tale of Two Models: Understanding Data Workers' Internal and External Representations of Complex Data

TL;DR

Abstract

Paper Structure (31 sections, 19 figures, 1 table)

This paper contains 31 sections, 19 figures, 1 table.

Introduction
Background: Data Models
General Data Definitions Used
EnsembleAPI Data Model
Related Works
Mental Models
Data Abstractions and Data Models
Heterogeneous Datasets Across Domains
Study Methodology
Study Participants
Study Design
Study Procedure
Analysis
Statement of Positionality
Findings
...and 16 more sections

Figures (19)

Figure 1: The reference data model of EnsembleAPI, the project all participants were recruited from.
Figure 2: A summary of themes we report on in \ref{['sec:findings']}.
Figure 3: Common visual idioms use by participants to represent the data model components. Tables were drawn with and without lines. Trees were mostly indented, but some were node-link. Line charts were often used to express the entirety of the dataset. (See supplementary materials for all drawings.)
Figure 4: The "timeline" of recalled data components per participant. The left axis lists interview questions in the order they were asked, top to bottom. Initial recall was limited when we asked participants to describe their data. Additional elicitation methods (drawing, task questions) lead individuals to recall more and think about their data in different ways.
Figure 5: An excerpt of P2's sketch. P2's exemplifies usages of arrows across participants' drawings. Arrows were most commonly used to indicate a workflow of data movement between data components or from the data source. The were also used to denote links between parts of the data.
...and 14 more figures

A Tale of Two Models: Understanding Data Workers' Internal and External Representations of Complex Data

TL;DR

Abstract

A Tale of Two Models: Understanding Data Workers' Internal and External Representations of Complex Data

Authors

TL;DR

Abstract

Table of Contents

Figures (19)