Table of Contents
Fetching ...

The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review

Daniel Schwabe, Katinka Becker, Martin Seyferth, Andreas Klaß, Tobias Schäffter

TL;DR

The METRIC-framework is proposed, a specialised data quality framework for medical training data comprising 15 awareness dimensions, along which developers of medical ML applications should investigate the content of a dataset and lays the foundation for trustworthy AI in medicine.

Abstract

The adoption of machine learning (ML) and, more specifically, deep learning (DL) applications into all major areas of our lives is underway. The development of trustworthy AI is especially important in medicine due to the large implications for patients' lives. While trustworthiness concerns various aspects including ethical, technical and privacy requirements, we focus on the importance of data quality (training/test) in DL. Since data quality dictates the behaviour of ML products, evaluating data quality will play a key part in the regulatory approval of medical AI products. We perform a systematic review following PRISMA guidelines using the databases PubMed and ACM Digital Library. We identify 2362 studies, out of which 62 records fulfil our eligibility criteria. From this literature, we synthesise the existing knowledge on data quality frameworks and combine it with the perspective of ML applications in medicine. As a result, we propose the METRIC-framework, a specialised data quality framework for medical training data comprising 15 awareness dimensions, along which developers of medical ML applications should investigate a dataset. This knowledge helps to reduce biases as a major source of unfairness, increase robustness, facilitate interpretability and thus lays the foundation for trustworthy AI in medicine. Incorporating such systematic assessment of medical datasets into regulatory approval processes has the potential to accelerate the approval of ML products and builds the basis for new standards.

The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review

TL;DR

The METRIC-framework is proposed, a specialised data quality framework for medical training data comprising 15 awareness dimensions, along which developers of medical ML applications should investigate the content of a dataset and lays the foundation for trustworthy AI in medicine.

Abstract

The adoption of machine learning (ML) and, more specifically, deep learning (DL) applications into all major areas of our lives is underway. The development of trustworthy AI is especially important in medicine due to the large implications for patients' lives. While trustworthiness concerns various aspects including ethical, technical and privacy requirements, we focus on the importance of data quality (training/test) in DL. Since data quality dictates the behaviour of ML products, evaluating data quality will play a key part in the regulatory approval of medical AI products. We perform a systematic review following PRISMA guidelines using the databases PubMed and ACM Digital Library. We identify 2362 studies, out of which 62 records fulfil our eligibility criteria. From this literature, we synthesise the existing knowledge on data quality frameworks and combine it with the perspective of ML applications in medicine. As a result, we propose the METRIC-framework, a specialised data quality framework for medical training data comprising 15 awareness dimensions, along which developers of medical ML applications should investigate a dataset. This knowledge helps to reduce biases as a major source of unfairness, increase robustness, facilitate interpretability and thus lays the foundation for trustworthy AI in medicine. Incorporating such systematic assessment of medical datasets into regulatory approval processes has the potential to accelerate the approval of ML products and builds the basis for new standards.
Paper Structure (27 sections, 6 figures, 6 tables)

This paper contains 27 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: PRISMA flow diagram. The flow diagram shows the number of records identified, included and excluded at the different stages of the systematic review. The eligibility criteria for inclusion and exclusion are presented in the bottom right hand side. From a total of 2362 identified studies, the resulting literature corpus on data quality for trustworthy AI in medicine includes 62 studies.
  • Figure 2: Studies included in our literature corpus sorted by publication date. The 62 studies are divided into the three categories general data (28), big data (7) and ML data (27), which represent major changes in the perception of data quality. The studies' affiliation to non-life science and life science related topics is indicated as well.
  • Figure 3: The METRIC-framework. This specialised framework for evaluating data quality of medical training data includes a comprehensive set of awareness dimensions. The inner circle divides data quality into five clusters. These clusters contain a total of 15 data quality dimensions, which are shown on the outer circle. The subdimensions presented in gray on the border of the figure contribute to the superordinate dimension. Due to the shape of the graphic, we refer to it as wheel of data quality.
  • Figure 4: The cluster data management is concerned with the effective usage of the dataset. It includes basic requirements for the dataset but does not address data quality issues regarding its content. Therefore, they can be seen as a prerequisite for assessment using the METRIC-framework. Figuratively speaking, the data management cluster serves as a stable foundation for the wheel of data quality.
  • Figure 5: Categorisation of dimensions along the properties quantitative vs. qualitative measure (left) and use case dependence for evaluating data quality (right). The affiliation to a category is colour-coded. The colour scale is presented in the inner circle.
  • ...and 1 more figures