Machine Learning Practitioners' Views on Data Quality in Light of EU Regulatory Requirements: A European Online Survey
Yichun Wang, Kristina Irion, Paul Groth, Hazar Harmouch
TL;DR
This paper investigates how EU regulatory requirements shape data quality practices in machine learning by integrating technical data quality dimensions with GDPR and AI Act provisions. Using a design‑science approach, it develops a practical framework and validates it with an online survey of 185 EU data practitioners to identify gaps, needs, and collaboration patterns between technical and legal teams. Key contributions include a regulatory‑aligned data quality vocabulary, empirical insights into practice (priorities, tooling, and challenges), and actionable recommendations for integrated tooling and cross‑functional governance. The findings highlight substantial gaps between current practices and regulatory expectations, underscoring the value of compliance‑aware data quality management to enable responsible and trustworthy ML deployments across regulated contexts.
Abstract
Understanding how data quality aligns with regulatory requirements in machine learning (ML) systems presents a critical challenge for practitioners navigating the evolving EU regulatory landscape. To address this, we first propose a practical framework aligning established data quality dimensions with specific EU regulatory requirements. Second, we conducted a comprehensive online survey with over 180 EU-based data practitioners, investigating their approaches, key challenges, and unmet needs when ensuring data quality in ML systems that align with regulatory requirements. Our findings highlight crucial gaps between current practices and regulatory expectations, underscoring practitioners' need for more integrated data quality tools and better collaboration between technical and legal practitioners. These insights inform recommendations for bridging technical expertise and regulatory compliance, ultimately fostering responsible and trustworthy ML deployments.
