Table of Contents
Fetching ...

FetMRQC: a robust quality control system for multi-centric fetal brain MRI

Thomas Sanchez, Oscar Esteban, Yvan Gomez, Alexandre Pron, Mériam Koob, Vincent Dunet, Nadine Girard, Andras Jakab, Elisenda Eixarch, Guillaume Auzias, Meritxell Bach Cuadra

TL;DR

FetMRQC addresses the challenge of reliable QA/QC in fetal brain MRI, where motion and heterogeneous acquisition pipelines create strong domain shifts. The authors propose an open-source framework that extracts a large, diverse set of IQMs from unprocessed T2-weighted stacks and trains a random forest to perform regression for QA and binary classification for QC, augmented by per-stack HTML reports to aid expert screening. They validate on a large multicenter dataset (over 1600 stacks from four institutions and 13 scanners), showing good generalization to unseen data and interpretability through feature importance, while outperforming DL baselines in cross-domain settings. The work provides a practical, scalable tool to improve robustness and reproducibility in fetal neuroimaging pipelines, with potential to enhance downstream processing such as super-resolution reconstruction and segmentation.

Abstract

Fetal brain MRI is becoming an increasingly relevant complement to neurosonography for perinatal diagnosis, allowing fundamental insights into fetal brain development throughout gestation. However, uncontrolled fetal motion and heterogeneity in acquisition protocols lead to data of variable quality, potentially biasing the outcome of subsequent studies. We present FetMRQC, an open-source machine-learning framework for automated image quality assessment and quality control that is robust to domain shifts induced by the heterogeneity of clinical data. FetMRQC extracts an ensemble of quality metrics from unprocessed anatomical MRI and combines them to predict experts' ratings using random forests. We validate our framework on a pioneeringly large and diverse dataset of more than 1600 manually rated fetal brain T2-weighted images from four clinical centers and 13 different scanners. Our study shows that FetMRQC's predictions generalize well to unseen data while being interpretable. FetMRQC is a step towards more robust fetal brain neuroimaging, which has the potential to shed new insights on the developing human brain.

FetMRQC: a robust quality control system for multi-centric fetal brain MRI

TL;DR

FetMRQC addresses the challenge of reliable QA/QC in fetal brain MRI, where motion and heterogeneous acquisition pipelines create strong domain shifts. The authors propose an open-source framework that extracts a large, diverse set of IQMs from unprocessed T2-weighted stacks and trains a random forest to perform regression for QA and binary classification for QC, augmented by per-stack HTML reports to aid expert screening. They validate on a large multicenter dataset (over 1600 stacks from four institutions and 13 scanners), showing good generalization to unseen data and interpretability through feature importance, while outperforming DL baselines in cross-domain settings. The work provides a practical, scalable tool to improve robustness and reproducibility in fetal neuroimaging pipelines, with potential to enhance downstream processing such as super-resolution reconstruction and segmentation.

Abstract

Fetal brain MRI is becoming an increasingly relevant complement to neurosonography for perinatal diagnosis, allowing fundamental insights into fetal brain development throughout gestation. However, uncontrolled fetal motion and heterogeneity in acquisition protocols lead to data of variable quality, potentially biasing the outcome of subsequent studies. We present FetMRQC, an open-source machine-learning framework for automated image quality assessment and quality control that is robust to domain shifts induced by the heterogeneity of clinical data. FetMRQC extracts an ensemble of quality metrics from unprocessed anatomical MRI and combines them to predict experts' ratings using random forests. We validate our framework on a pioneeringly large and diverse dataset of more than 1600 manually rated fetal brain T2-weighted images from four clinical centers and 13 different scanners. Our study shows that FetMRQC's predictions generalize well to unseen data while being interpretable. FetMRQC is a step towards more robust fetal brain neuroimaging, which has the potential to shed new insights on the developing human brain.
Paper Structure (23 sections, 9 figures, 6 tables)

This paper contains 23 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Variations in data quality illustrated.A -- Comparison of data across adult (T1w), from the ABIDE dataset di2014autism and from fetal acquisitions. In the excluded scans, the adult image on the left suffers from severe motion artifacts, while large coil artifacts corrupt the image on the right. The fetal data suffer from strong intensity changes between multiple slices and signal drop; in the through-plane view, strong inter-slice motion makes it difficult to discern the brain structures. B -- Examples of data acquired on different scanners, with very different appearance. The in-plane and through-plane resolution, the field of view, the repetition time (TR), and the echo time (TE) can all substantially change between acquisition protocols. C -- Importance of quality control for super-resolution reconstruction (SRR), illustrated using NiftyMIC ebner_automated_2020, and NeSVoR xu2023nesvor, two SRR methods with built-in outlier rejection. On the top row a subject is reconstructed using all stacks available (13 for NiftyMIC, 5 for NeSVoR), and each reconstruction shows large artifacts. On the bottom row, FetMRQC is plugged in and by removing low quality series (6 out of 13 for NiftyMIC, 2 out of 5 for NeSVoR), the reconstruction quality is improved.
  • Figure 2: A look into the dataset.A -- Illustration of the quality rating interface developed in this work. B -- Inter-rater agreement on the 211 stacks annotated by both raters. The global R value is 0.75. Note that stacks from La Timone were only annotated by Rater 2. C -- Distribution of the quality ratings across the different sites considered, on all data. The median values are respectively $1.75$$[0.84,2.4]$ for BCN, $1.75$$[1, 2.45]$ for CHUV and $1$$[0.1, 2.05]$ for KISPI.
  • Figure 3: Scanner-wise results for QA/QC.A -- Weighted F1 score for the QC task for each scanner used in LoSo cross-validation (sorted from the one with the least subjects to the most subjects). B -- Weighted F1 score for the QC task for each scanner used in the pure testing set. C -- $\text{R}^\text{2}$ for the QA task for each scanner used in LoSo cross-validation. D -- $\text{R}^\text{2}$ for the QA task for each scanner used in the pure testing set. Distribution of scores is aggregated by scanner, and the median performance for each method is shown as the black dashed line. The red line in the prediction task at $0$ shows the baselines for a constant predictor. These results detail the ones presented in Table \ref{['tab:vanilla_perf']}.
  • Figure 4: Performance as a function of the number of scanners and training points. This is obtained by performing leave-one-scanner-out cross-validation 20 times, using different random subsets of data. (Top row.) Minimum (worst-case) performance across folds (Middle row.) Median performance across folds. The smaller plots show the corresponding median average deviation. (Bottom row.) Maximum (best-case) performance across folds.
  • Figure 5: Most important IQMs for QA/QC. Feature importance for quality control (classification) on the left, and for quality assessment (regression) on the right. The top row shows the top-25 IQMs from FetMRQC and the bottom row shows the 20 selected IQMs that form FetMRQC-20. Blue IQMs are intensity-based, orange are mask- (or shape) based, green are segmentation based, pink are deep-learning based and brown are metadata based. Hatched features denote the new ones proposed in this work. The error bars are the standard deviation over the different cross-validation folds, performed over different scanners. Note that the scales are very different between the plots: the highest feature importance for classification is around 0.055, whereas it is around 0.23 for regression.
  • ...and 4 more figures