Adaptive Data Quality Scoring Operations Framework using Drift-Aware Mechanism for Industrial Applications
Firas Bayram, Bestoun S. Ahmed, Erik Hallin
TL;DR
The paper tackles the challenge of maintaining reliable, dynamic data quality scores for industrial AI by introducing an adaptive framework that couples ML-based data quality scoring with drift-detection-driven adaptation. It employs a drift detector using Jensen–Shannon divergence and p-value testing to trigger recalibration, and aggregates multiple quality dimensions into a single PCA-based score $\text{DQS}_i = \text{PCA}(DQ_1,\dots,DQ_n)$. Empirical results from an ESR use case show substantial reductions in processing time while preserving or enhancing predictive accuracy, and reveal meaningful shifts in dynamic dimensions like Timeliness and Skewness after drift. The approach demonstrates practical benefits for real-time, large-scale industrial settings by balancing accuracy, responsiveness, and computational efficiency, with clear guidance on deployment and future integration into broader data-driven AI systems.
Abstract
Within data-driven artificial intelligence (AI) systems for industrial applications, ensuring the reliability of the incoming data streams is an integral part of trustworthy decision-making. An approach to assess data validity is data quality scoring, which assigns a score to each data point or stream based on various quality dimensions. However, certain dimensions exhibit dynamic qualities, which require adaptation on the basis of the system's current conditions. Existing methods often overlook this aspect, making them inefficient in dynamic production environments. In this paper, we introduce the Adaptive Data Quality Scoring Operations Framework, a novel framework developed to address the challenges posed by dynamic quality dimensions in industrial data streams. The framework introduces an innovative approach by integrating a dynamic change detector mechanism that actively monitors and adapts to changes in data quality, ensuring the relevance of quality scores. We evaluate the proposed framework performance in a real-world industrial use case. The experimental results reveal high predictive performance and efficient processing time, highlighting its effectiveness in practical quality-driven AI applications.
