Table of Contents
Fetching ...

Development of Automated Data Quality Assessment and Evaluation Indices by Analytical Experience

Yuka Haruki, Kei Kato, Yuki Enami, Hiroaki Takeuchi, Daiki Kazuno, Kotaro Yamada, Teruaki Hayashi

TL;DR

This paper tackles the challenge of inconsistent data-quality assessments in data markets by introducing a universal automated data-quality assessment tool that generates quality metadata across ten practical indices. It evaluates the tool through two studies: a questionnaire with 41 participants across experience levels and an eye-tracking cognitive validation using six meteorological datasets. Findings show that quality metadata reduces misrecognition and non-evaluable judgments, with higher utility perceived by experienced evaluators; completeness emerges as a key determinant of data purchase decisions, while the benefits of metadata depend on user experience. The work advances data-distribution practices by enabling cross-field DQA and providing insights into how human cognition interacts with automated quality summaries, paving the way for broader adoption in data platforms.

Abstract

The societal need to leverage third-party data has driven the data-distribution market and increased the importance of data quality assessment (DQA) in data transactions between organizations. However, DQA requires expert knowledge of raw data and related data attributes, which hinders consensus-building in data purchasing. This study focused on the differences in DQAs between experienced and inexperienced data handlers. We performed two experiments: The first was a questionnaire survey involving 41 participants with varying levels of data-handling experience, who evaluated 12 data samples using 10 predefined indices with and without quality metadata generated by the automated tool. The second was an eye-tracking experiment to reveal the viewing behavior of participants during data evaluation. It was revealed that using quality metadata generated by the automated tool can reduce misrecognition in DQA. While experienced data handlers rated the quality metadata highly, semi-experienced users gave it the lowest ratings. This study contributes to enhancing data understanding within organizations and promoting the distribution of valuable data by proposing an automated tool to support DQAs.

Development of Automated Data Quality Assessment and Evaluation Indices by Analytical Experience

TL;DR

This paper tackles the challenge of inconsistent data-quality assessments in data markets by introducing a universal automated data-quality assessment tool that generates quality metadata across ten practical indices. It evaluates the tool through two studies: a questionnaire with 41 participants across experience levels and an eye-tracking cognitive validation using six meteorological datasets. Findings show that quality metadata reduces misrecognition and non-evaluable judgments, with higher utility perceived by experienced evaluators; completeness emerges as a key determinant of data purchase decisions, while the benefits of metadata depend on user experience. The work advances data-distribution practices by enabling cross-field DQA and providing insights into how human cognition interacts with automated quality summaries, paving the way for broader adoption in data platforms.

Abstract

The societal need to leverage third-party data has driven the data-distribution market and increased the importance of data quality assessment (DQA) in data transactions between organizations. However, DQA requires expert knowledge of raw data and related data attributes, which hinders consensus-building in data purchasing. This study focused on the differences in DQAs between experienced and inexperienced data handlers. We performed two experiments: The first was a questionnaire survey involving 41 participants with varying levels of data-handling experience, who evaluated 12 data samples using 10 predefined indices with and without quality metadata generated by the automated tool. The second was an eye-tracking experiment to reveal the viewing behavior of participants during data evaluation. It was revealed that using quality metadata generated by the automated tool can reduce misrecognition in DQA. While experienced data handlers rated the quality metadata highly, semi-experienced users gave it the lowest ratings. This study contributes to enhancing data understanding within organizations and promoting the distribution of valuable data by proposing an automated tool to support DQAs.

Paper Structure

This paper contains 23 sections, 3 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Example of quality metadata
  • Figure 2: Classification of subject groups and assignment of datasets in the experiment (Group $\alpha$ and $\beta$)
  • Figure 3: Ratio of subjects who answered "cannot evaluate."
  • Figure 4: Change in coefficients of variation (CV) with viewing quality metadata
  • Figure 5: Change in false answer ratio with viewing quality metadata
  • ...and 3 more figures