Data Guards: Challenges and Solutions for Fostering Trust in Data

Nicole Sultanum; Dennis Bromley; Michael Correll

Data Guards: Challenges and Solutions for Fostering Trust in Data

Nicole Sultanum, Dennis Bromley, Michael Correll

TL;DR

The paper tackles the problem of establishing trust in data artifacts amid dirty data and potential deception. It uses two rounds of interviews with data producers and consumers, complemented by a card-sorting exercise, to identify trust barriers and propose seven data-guard strategies grouped into Overview, Details, and Community clusters. Key contributions include a barrier-based framework (B1-B6), five design-goal mappings (G1-G5), and seven consumer-focused data guards (Data and Pipeline Tests, Data Quality Agent, Data and Pipeline Change Alerts, Explanation and Status, Data Traces, Stamp of Approval, Crowd wisdom) validated through consumer feedback. The work advocates embedding data guards into analytics tools to improve trust while acknowledging trade-offs like potential complexity and alert fatigue, and calls for future research to operationalize and evaluate these guards in practice.

Abstract

From dirty data to intentional deception, there are many threats to the validity of data-driven decisions. Making use of data, especially new or unfamiliar data, therefore requires a degree of trust or verification. How is this trust established? In this paper, we present the results of a series of interviews with both producers and consumers of data artifacts (outputs of data ecosystems like spreadsheets, charts, and dashboards) aimed at understanding strategies and obstacles to building trust in data. We find a recurring need, but lack of existing standards, for data validation and verification, especially among data consumers. We therefore propose a set of data guards: methods and tools for fostering trust in data artifacts.

Data Guards: Challenges and Solutions for Fostering Trust in Data

TL;DR

Abstract

Paper Structure (12 sections, 1 figure)

This paper contains 12 sections, 1 figure.

Introduction
Related Work
Data Workers and Trust-Building
Smells and Mirages: Threats to Data Reliability
Instrumenting Trust: Profiling and Verification
Characterizing Data Trust Needs
Barriers for Data Trust
Mitigation Strategies and Desiderata
Data Guards
Data Guards Ideation
Consumer Feedback on Data Guards
Discussion and Closing Thoughts

Figures (1)

Figure 1: Card sorting results for different data-trust solutions broken out by participant and solution and colored by rank (from #1 to #7). Individual boxes indicate which rank was selected by whom for each solution. Solutions are sorted vertically by recursive ranking: highest number of rank #1's first, followed by rank #2's, and so on.

Data Guards: Challenges and Solutions for Fostering Trust in Data

TL;DR

Abstract

Data Guards: Challenges and Solutions for Fostering Trust in Data

Authors

TL;DR

Abstract

Table of Contents

Figures (1)