Table of Contents
Fetching ...

Machine-learning competition to grade EEG background patterns in newborns with hypoxic-ischaemic encephalopathy

Fabio Magarelli, Geraldine B. Boylan, Saeed Montazeri, Feargal O'Sullivan, Dominic Lightbody, Minoo Ashoori, Tamara Skoric, John M. O'Toole

TL;DR

This study presents a machine-learning competition to grade neonatal EEG background patterns in newborns with hypoxic-ischemic encephalopathy, using a retrospective multi-centre ANSeR dataset and an open-access platform to benchmark models. Four diverse approaches—CNN, ConvNeXt with Gramian Angular Field representations, gradient-boosted NEURAL features, and SVM with qEEG features—were submitted and evaluated on a held-out validation set, revealing strong public performance for some methods but substantial generalization gaps to unseen data. The results indicate deep learning models can generalize better than feature-based methods in this context, yet require larger and more diverse datasets to be reliable for clinical use; reduced-channel configurations show practical potential for resource-limited NICUs. The work highlights the value of open data, reproducible ML pipelines, and collaborative competition frameworks for accelerating neonatal EEG analysis while outlining limitations and avenues for future work.

Abstract

Machine learning (ML) has the potential to support and improve expert performance in monitoring the brain function of at-risk newborns. Developing accurate and reliable ML models depends on access to high-quality, annotated data, a resource in short supply. ML competitions address this need by providing researchers access to expertly annotated datasets, fostering shared learning through direct model comparisons, and leveraging the benefits of crowdsourcing diverse expertise. We compiled a retrospective dataset containing 353 hours of EEG from 102 individual newborns from a multi-centre study. The data was fully anonymised and divided into training, testing, and held-out validation datasets. EEGs were graded for the severity of abnormal background patterns. Next, we created a web-based competition platform and hosted a machine learning competition to develop ML models for classifying the severity of EEG background patterns in newborns. After the competition closed, the top 4 performing models were evaluated offline on a separate held-out validation dataset. Although a feature-based model ranked first on the testing dataset, deep learning models generalised better on the validation sets. All methods had a significant decline in validation performance compared to the testing performance. This highlights the challenges for model generalisation on unseen data, emphasising the need for held-out validation datasets in ML studies with neonatal EEG. The study underscores the importance of training ML models on large and diverse datasets to ensure robust generalisation. The competition's outcome demonstrates the potential for open-access data and collaborative ML development to foster a collaborative research environment and expedite the development of clinical decision-support tools for neonatal neuromonitoring.

Machine-learning competition to grade EEG background patterns in newborns with hypoxic-ischaemic encephalopathy

TL;DR

This study presents a machine-learning competition to grade neonatal EEG background patterns in newborns with hypoxic-ischemic encephalopathy, using a retrospective multi-centre ANSeR dataset and an open-access platform to benchmark models. Four diverse approaches—CNN, ConvNeXt with Gramian Angular Field representations, gradient-boosted NEURAL features, and SVM with qEEG features—were submitted and evaluated on a held-out validation set, revealing strong public performance for some methods but substantial generalization gaps to unseen data. The results indicate deep learning models can generalize better than feature-based methods in this context, yet require larger and more diverse datasets to be reliable for clinical use; reduced-channel configurations show practical potential for resource-limited NICUs. The work highlights the value of open data, reproducible ML pipelines, and collaborative competition frameworks for accelerating neonatal EEG analysis while outlining limitations and avenues for future work.

Abstract

Machine learning (ML) has the potential to support and improve expert performance in monitoring the brain function of at-risk newborns. Developing accurate and reliable ML models depends on access to high-quality, annotated data, a resource in short supply. ML competitions address this need by providing researchers access to expertly annotated datasets, fostering shared learning through direct model comparisons, and leveraging the benefits of crowdsourcing diverse expertise. We compiled a retrospective dataset containing 353 hours of EEG from 102 individual newborns from a multi-centre study. The data was fully anonymised and divided into training, testing, and held-out validation datasets. EEGs were graded for the severity of abnormal background patterns. Next, we created a web-based competition platform and hosted a machine learning competition to develop ML models for classifying the severity of EEG background patterns in newborns. After the competition closed, the top 4 performing models were evaluated offline on a separate held-out validation dataset. Although a feature-based model ranked first on the testing dataset, deep learning models generalised better on the validation sets. All methods had a significant decline in validation performance compared to the testing performance. This highlights the challenges for model generalisation on unseen data, emphasising the need for held-out validation datasets in ML studies with neonatal EEG. The study underscores the importance of training ML models on large and diverse datasets to ensure robust generalisation. The competition's outcome demonstrates the potential for open-access data and collaborative ML development to foster a collaborative research environment and expedite the development of clinical decision-support tools for neonatal neuromonitoring.

Paper Structure

This paper contains 18 sections, 2 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: (Left) Data distribution within the different datasets; (Right) distribution of EEG grades.
  • Figure 2: On the public leaderboard, all users can see a visual representation of the scores obtained by each participant for every evaluation measure chosen by the competition host. These representations show the scores relative to the last and best submission of the user.
  • Figure 3: Workflow for creating and participting in a ML competition.
  • Figure 4: Flowchart of the classification of the 4 EEG grades using an SVM.
  • Figure 5: Confusion matrices for each model (CNN, ConvNeXt, XGBoost, and SVM) on both the testing and validation datasets. The numbers within the matrices represent the number of epochs. All models exhibited higher accuracy in predicting grade 1 in the testing datasets. Misclassification between grades 1 and 2 was pervasive across models, particularly in the validation dataset, as discriminating between these grades can prove challenging, even for expert neurophysiologists.
  • ...and 1 more figures