The Sim-to-Real Gap in MRS Quantification: A Systematic Deep Learning Validation for GABA

Zien Ma; S. M. Shermer; Oktay Karakuş; Frank C. Langbein

The Sim-to-Real Gap in MRS Quantification: A Systematic Deep Learning Validation for GABA

Zien Ma, S. M. Shermer, Oktay Karakuş, Frank C. Langbein

TL;DR

This work investigates and validate deep learning for quantifying complex, low-SNR, overlapping signals from MEGA-PRESS spectra, devise a convolutional neural network (CNN) and a Y-shaped autoencoder (YAE) and select the best models via Bayesian optimisation on 10,000 simulated spectra from slice-profile-aware MEGA-PRESS simulations.

Abstract

Magnetic resonance spectroscopy (MRS) is used to quantify metabolites in vivo and estimate biomarkers for conditions ranging from neurological disorders to cancers. Quantifying low-concentration metabolites such as GABA ($γ$-aminobutyric acid) is challenging due to low signal-to-noise ratio (SNR) and spectral overlap. We investigate and validate deep learning for quantifying complex, low-SNR, overlapping signals from MEGA-PRESS spectra, devise a convolutional neural network (CNN) and a Y-shaped autoencoder (YAE), and select the best models via Bayesian optimisation on 10,000 simulated spectra from slice-profile-aware MEGA-PRESS simulations. The selected models are trained on 100,000 simulated spectra. We validate their performance on 144 spectra from 112 experimental phantoms containing five metabolites of interest (GABA, Glu, Gln, NAA, Cr) with known ground truth concentrations across solution and gel series acquired at 3 T under varied bandwidths and implementations. These models are further assessed against the widely used LCModel quantification tool. On simulations, both models achieve near-perfect agreement (small MAEs; regression slopes $\approx 1.00$, $R^2 \approx 1.00$). On experimental phantom data, errors initially increased substantially. However, modelling variable linewidths in the training data significantly reduced this gap. The best augmented deep learning models achieved a mean MAE for GABA over all phantom spectra of 0.151 (YAE) and 0.160 (FCNN) in max-normalised relative concentrations, outperforming the conventional baseline LCModel (0.220). A sim-to-real gap remains, but physics-informed data augmentation substantially reduced it. Phantom ground truth is needed to judge whether a method will perform reliably on real data.

The Sim-to-Real Gap in MRS Quantification: A Systematic Deep Learning Validation for GABA

TL;DR

Abstract

-aminobutyric acid) is challenging due to low signal-to-noise ratio (SNR) and spectral overlap. We investigate and validate deep learning for quantifying complex, low-SNR, overlapping signals from MEGA-PRESS spectra, devise a convolutional neural network (CNN) and a Y-shaped autoencoder (YAE), and select the best models via Bayesian optimisation on 10,000 simulated spectra from slice-profile-aware MEGA-PRESS simulations. The selected models are trained on 100,000 simulated spectra. We validate their performance on 144 spectra from 112 experimental phantoms containing five metabolites of interest (GABA, Glu, Gln, NAA, Cr) with known ground truth concentrations across solution and gel series acquired at 3 T under varied bandwidths and implementations. These models are further assessed against the widely used LCModel quantification tool. On simulations, both models achieve near-perfect agreement (small MAEs; regression slopes

). On experimental phantom data, errors initially increased substantially. However, modelling variable linewidths in the training data significantly reduced this gap. The best augmented deep learning models achieved a mean MAE for GABA over all phantom spectra of 0.151 (YAE) and 0.160 (FCNN) in max-normalised relative concentrations, outperforming the conventional baseline LCModel (0.220). A sim-to-real gap remains, but physics-informed data augmentation substantially reduced it. Phantom ground truth is needed to judge whether a method will perform reliably on real data.

Paper Structure (36 sections, 7 equations, 9 figures, 13 tables)

This paper contains 36 sections, 7 equations, 9 figures, 13 tables.

Introduction
Related Work
Conventional Quantification Methods
Machine Learning Methods
Limitations of Existing Work and Our Contribution
Simulation, Phantom Experiments, and Evaluation Metrics
Simulated Datasets
Concentration Sampling and Noise Injection
Experimental Dataset
Preprocessing the Datasets
Performance Evaluation
Experimental Ground-Truth Validation
Statistical Comparison of Quantification Errors
Deep Learning Architectures
CNN: Convolutional Multi-Class Regressor
...and 21 more sections

Figures (9)

Figure 1: The sequence diagram (a) shows the RF and gradient pulses with actual pulse shapes and timings used in the simulations. The initial excitation pulse is modelled as an ideal (instantaneous) slice-selective $90^\circ$ pulse and the corresponding slice selection gradient $G_z$ is therefore omitted. The excitation pulse excites a slice of thickness $3cm$ perpendicular to the $z$-axis. The refocusing pulses refocus the magnetisation of $3cm$ thick slabs perpendicular to the $x$ and $y$ axis, respectively, to define the localised voxel. What differentiates the MEGA-PRESS sequence from the standard PRESS sequence is the presence of two $20ms$ frequency-selective Gaussian editing pulses (yellow and green) at $1.9ppm$ for the ON acquisition. For the OFF spectra the editing pulses could in principle be omitted, but the simulation follows the experimental implementation where editing pulses at $7.5ppm$, which have no effect on the metabolites of interest, are applied instead. The readout of the signal starts at $68ms$ as indicated. Experimentally, it can last over $1s$, depending on the dwell time and number of samples acquired. For a bandwidth of $2000Hz$, the dwell time is $0.5ms$ and acquiring $N=2048$ samples, a typical signal length, would therefore require $2048 \times 0.5ms = 1.024s$. The readout block in the diagram is truncated at $80ms$ for clarity, to show the RF pulses and timings. To account for imperfect slice profiles of the refocusing pulses, the spectra are simulated on a spatial grid (b) and the average over all positions is calculated.
Figure 2: Illustration of the Y-shaped autoencoder (YAE) architecture. The input consists of one or more noisy MEGA-PRESS spectra (OFF, ON, DIFF using real, imaginary or magnitude representations). The encoder maps the input to a compressed latent representation. The decoder branch reconstructs denoised versions of the input spectra from this latent space. The quantifier branch predicts the metabolite concentrations from the same latent representation. Key components are highlighted: ① hidden layer activation function, ② dropout layer, ③ decoder output activation function, and ④ quantifier output activation function.
Figure 3: Validation and training concentration MAE for the top $25$ of $432$ CNN configurations from the grid search model selection (Table \ref{['tab:cnn-simple-all']}). Each pair of horizontal bars corresponds to one configuration: blue, validation MAE; red, training MAE. Error bars show standard deviation across five-fold cross-validation. Configurations are ordered by validation MAE (best at top). Dataset: $10{,}000$ simulated spectra, sum normalisation, basis set linewidth $2Hz$, $100$ epochs.
Figure 4: Performance of $50$ repetitions of the Bayesian optimisation for the CNN model selection over the Bayesian optimisation iterations. The grey shaded area (right axis) shows the percentage of runs that selected the same best configuration as the full grid search ("converged").
Figure 5: Validation and training concentration MAE for YAE configurations from the final joint optimisation (Stage 3). Each pair of horizontal bars corresponds to one configuration: blue, validation MAE; red, training MAE. Error bars show standard deviation across five-fold cross-validation. Configurations are ordered by validation MAE (best at top). Dataset: $10{,}000$ simulated spectra, sum normalisation, $200$ epochs (Table \ref{['tab:refined_space']}).
...and 4 more figures

The Sim-to-Real Gap in MRS Quantification: A Systematic Deep Learning Validation for GABA

TL;DR

Abstract

The Sim-to-Real Gap in MRS Quantification: A Systematic Deep Learning Validation for GABA

Authors

TL;DR

Abstract

Table of Contents

Figures (9)