Table of Contents
Fetching ...

Machine Learning Practitioners' Views on Data Quality in Light of EU Regulatory Requirements: A European Online Survey

Yichun Wang, Kristina Irion, Paul Groth, Hazar Harmouch

TL;DR

This paper investigates how EU regulatory requirements shape data quality practices in machine learning by integrating technical data quality dimensions with GDPR and AI Act provisions. Using a design‑science approach, it develops a practical framework and validates it with an online survey of 185 EU data practitioners to identify gaps, needs, and collaboration patterns between technical and legal teams. Key contributions include a regulatory‑aligned data quality vocabulary, empirical insights into practice (priorities, tooling, and challenges), and actionable recommendations for integrated tooling and cross‑functional governance. The findings highlight substantial gaps between current practices and regulatory expectations, underscoring the value of compliance‑aware data quality management to enable responsible and trustworthy ML deployments across regulated contexts.

Abstract

Understanding how data quality aligns with regulatory requirements in machine learning (ML) systems presents a critical challenge for practitioners navigating the evolving EU regulatory landscape. To address this, we first propose a practical framework aligning established data quality dimensions with specific EU regulatory requirements. Second, we conducted a comprehensive online survey with over 180 EU-based data practitioners, investigating their approaches, key challenges, and unmet needs when ensuring data quality in ML systems that align with regulatory requirements. Our findings highlight crucial gaps between current practices and regulatory expectations, underscoring practitioners' need for more integrated data quality tools and better collaboration between technical and legal practitioners. These insights inform recommendations for bridging technical expertise and regulatory compliance, ultimately fostering responsible and trustworthy ML deployments.

Machine Learning Practitioners' Views on Data Quality in Light of EU Regulatory Requirements: A European Online Survey

TL;DR

This paper investigates how EU regulatory requirements shape data quality practices in machine learning by integrating technical data quality dimensions with GDPR and AI Act provisions. Using a design‑science approach, it develops a practical framework and validates it with an online survey of 185 EU data practitioners to identify gaps, needs, and collaboration patterns between technical and legal teams. Key contributions include a regulatory‑aligned data quality vocabulary, empirical insights into practice (priorities, tooling, and challenges), and actionable recommendations for integrated tooling and cross‑functional governance. The findings highlight substantial gaps between current practices and regulatory expectations, underscoring the value of compliance‑aware data quality management to enable responsible and trustworthy ML deployments across regulated contexts.

Abstract

Understanding how data quality aligns with regulatory requirements in machine learning (ML) systems presents a critical challenge for practitioners navigating the evolving EU regulatory landscape. To address this, we first propose a practical framework aligning established data quality dimensions with specific EU regulatory requirements. Second, we conducted a comprehensive online survey with over 180 EU-based data practitioners, investigating their approaches, key challenges, and unmet needs when ensuring data quality in ML systems that align with regulatory requirements. Our findings highlight crucial gaps between current practices and regulatory expectations, underscoring practitioners' need for more integrated data quality tools and better collaboration between technical and legal practitioners. These insights inform recommendations for bridging technical expertise and regulatory compliance, ultimately fostering responsible and trustworthy ML deployments.
Paper Structure (14 sections, 5 figures, 1 table)

This paper contains 14 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Survey demographics: self-reported team roles (left) and industries (right)
  • Figure 2: Data quality management activities and challenges. Error bars indicate 95% CI for the proportion of respondents.
  • Figure 3: Correlation heatmaps: (a) Activities correlation; (b) Activities × unsolved challenges; (c) Activities × motivating factors.
  • Figure 4: Organisational scale and tool diversity
  • Figure 5: Familiarity and responsibility