Data Guards: Challenges and Solutions for Fostering Trust in Data
Nicole Sultanum, Dennis Bromley, Michael Correll
TL;DR
The paper tackles the problem of establishing trust in data artifacts amid dirty data and potential deception. It uses two rounds of interviews with data producers and consumers, complemented by a card-sorting exercise, to identify trust barriers and propose seven data-guard strategies grouped into Overview, Details, and Community clusters. Key contributions include a barrier-based framework (B1-B6), five design-goal mappings (G1-G5), and seven consumer-focused data guards (Data and Pipeline Tests, Data Quality Agent, Data and Pipeline Change Alerts, Explanation and Status, Data Traces, Stamp of Approval, Crowd wisdom) validated through consumer feedback. The work advocates embedding data guards into analytics tools to improve trust while acknowledging trade-offs like potential complexity and alert fatigue, and calls for future research to operationalize and evaluate these guards in practice.
Abstract
From dirty data to intentional deception, there are many threats to the validity of data-driven decisions. Making use of data, especially new or unfamiliar data, therefore requires a degree of trust or verification. How is this trust established? In this paper, we present the results of a series of interviews with both producers and consumers of data artifacts (outputs of data ecosystems like spreadsheets, charts, and dashboards) aimed at understanding strategies and obstacles to building trust in data. We find a recurring need, but lack of existing standards, for data validation and verification, especially among data consumers. We therefore propose a set of data guards: methods and tools for fostering trust in data artifacts.
