Table of Contents
Fetching ...

The OxMat dataset: a multimodal resource for the development of AI-driven technologies in maternal and newborn child health

M. Jaleed Khan, Ioana Duta, Beth Albert, William Cooke, Manu Vatish, Gabriel Davis Jones

TL;DR

The paper identifies a data gap in maternal-fetal health CTG data and introduces OxMat, the largest curated dataset of raw time-series CTG signals linked to extensive maternal and neonatal clinical data collected from six sources over three decades. It details data consolidation, QC by clinicians, and automated cleaning to produce 177,211 CTG recordings from 51,036 pregnancies with 1,689,503 data points across 295 variables, including three auxiliary datasets for future integration. The OxMat resource emphasizes antepartum coverage (approximately 94% of CTGs) and near-complete outcome data, addressing limitations of prior datasets that were smaller, less detailed, or intrapartum-focused. This dataset enables development and validation of AI-driven prenatal care methods to improve maternal and fetal outcomes and serves as a foundation for future research in fetal monitoring and perinatal medicine.

Abstract

The rapid advancement of Artificial Intelligence (AI) in healthcare presents a unique opportunity for advancements in obstetric care, particularly through the analysis of cardiotocography (CTG) for fetal monitoring. However, the effectiveness of such technologies depends upon the availability of large, high-quality datasets that are suitable for machine learning. This paper introduces the Oxford Maternity (OxMat) dataset, the world's largest curated dataset of CTGs, featuring raw time series CTG data and extensive clinical data for both mothers and babies, which is ideally placed for machine learning. The OxMat dataset addresses the critical gap in women's health data by providing over 177,211 unique CTG recordings from 51,036 pregnancies, carefully curated and reviewed since 1991. The dataset also comprises over 200 antepartum, intrapartum and postpartum clinical variables, ensuring near-complete data for crucial outcomes such as stillbirth and acidaemia. While this dataset also covers the intrapartum stage, around 94% of the constituent CTGS are antepartum. This allows for a unique focus on the underserved antepartum period, in which early detection of at-risk fetuses can significantly improve health outcomes. Our comprehensive review of existing datasets reveals the limitations of current datasets: primarily, their lack of sufficient volume, detailed clinical data and antepartum data. The OxMat dataset lays a foundation for future AI-driven prenatal care, offering a robust resource for developing and testing algorithms aimed at improving maternal and fetal health outcomes.

The OxMat dataset: a multimodal resource for the development of AI-driven technologies in maternal and newborn child health

TL;DR

The paper identifies a data gap in maternal-fetal health CTG data and introduces OxMat, the largest curated dataset of raw time-series CTG signals linked to extensive maternal and neonatal clinical data collected from six sources over three decades. It details data consolidation, QC by clinicians, and automated cleaning to produce 177,211 CTG recordings from 51,036 pregnancies with 1,689,503 data points across 295 variables, including three auxiliary datasets for future integration. The OxMat resource emphasizes antepartum coverage (approximately 94% of CTGs) and near-complete outcome data, addressing limitations of prior datasets that were smaller, less detailed, or intrapartum-focused. This dataset enables development and validation of AI-driven prenatal care methods to improve maternal and fetal outcomes and serves as a foundation for future research in fetal monitoring and perinatal medicine.

Abstract

The rapid advancement of Artificial Intelligence (AI) in healthcare presents a unique opportunity for advancements in obstetric care, particularly through the analysis of cardiotocography (CTG) for fetal monitoring. However, the effectiveness of such technologies depends upon the availability of large, high-quality datasets that are suitable for machine learning. This paper introduces the Oxford Maternity (OxMat) dataset, the world's largest curated dataset of CTGs, featuring raw time series CTG data and extensive clinical data for both mothers and babies, which is ideally placed for machine learning. The OxMat dataset addresses the critical gap in women's health data by providing over 177,211 unique CTG recordings from 51,036 pregnancies, carefully curated and reviewed since 1991. The dataset also comprises over 200 antepartum, intrapartum and postpartum clinical variables, ensuring near-complete data for crucial outcomes such as stillbirth and acidaemia. While this dataset also covers the intrapartum stage, around 94% of the constituent CTGS are antepartum. This allows for a unique focus on the underserved antepartum period, in which early detection of at-risk fetuses can significantly improve health outcomes. Our comprehensive review of existing datasets reveals the limitations of current datasets: primarily, their lack of sufficient volume, detailed clinical data and antepartum data. The OxMat dataset lays a foundation for future AI-driven prenatal care, offering a robust resource for developing and testing algorithms aimed at improving maternal and fetal health outcomes.
Paper Structure (6 sections, 7 figures, 1 table)

This paper contains 6 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Dataset Curation Flowchart
  • Figure 2: Number of ultrasounds per mother
  • Figure 3: Number of CTGs per pregnancy
  • Figure 4: Number of pregnancies per mother
  • Figure 5: Yearly breakdown of the number of total data points and CTGs in the final dataset
  • ...and 2 more figures