Table of Contents
Fetching ...

Data-driven Synthesis of Magnetic Resonance Spectroscopy Data using a Variational Autoencoder

Dennis M. J. van de Sande, Julian P. Merkofer, Sina Amirrajab, Mitko Veta, Gerhard S. Drenthen, Jacobus F. A. Jansen, Marcel Breeuwer

TL;DR

This work proposes a data-driven framework for synthesizing in-vivo MRS data using a variational autoencoder (VAE) trained exclusively on measured single-voxel spectroscopy data and introduces a structured evaluation framework for generative MRS methods, emphasizing the importance of application-aware validation when synthetic data are used for downstream analysis.

Abstract

The development of deep learning methods for magnetic resonance spectroscopy (MRS) is often hindered by limited availability of large, high-quality training datasets. While physics-based simulations are commonly used to mitigate this limitation, accurately modeling all in-vivo signal components remains challenging. In this work, we propose a data-driven framework for synthesizing in-vivo MRS data using a variational autoencoder (VAE) trained exclusively on measured single-voxel spectroscopy data. The model learns a low-dimensional latent representation of complex-valued spectra and enables generation of new samples through latent-space sampling and interpolation. The generative performance of the proposed approach is evaluated using a comprehensive set of complementary analyses, including reconstruction quality, feature-level similarity using low-dimensional embeddings, application-based signal quality metrics, and metabolite quantification agreement. The results demonstrate that the VAE accurately reconstructs dominant spectral patterns and generates synthetic spectra that occupy the same feature space as in-vivo data. In an example application targeting GABA-edited spectroscopy, augmenting limited subsets of transients with synthetic spectra improves signal quality metrics such as signal-to-noise ratio, linewidth, and shape scores. However, the results also reveal limitations of the generative approach, including under-representation of stochastic noise and reduced accuracy in absolute metabolite quantification, particularly for applications sensitive to concentration estimates. These findings highlight both potential and limitations of data-driven MRS synthesis. Beyond the proposed model, this study introduces a structured evaluation framework for generative MRS methods, emphasizing the importance of application-aware validation when synthetic data are used for downstream analysis.

Data-driven Synthesis of Magnetic Resonance Spectroscopy Data using a Variational Autoencoder

TL;DR

This work proposes a data-driven framework for synthesizing in-vivo MRS data using a variational autoencoder (VAE) trained exclusively on measured single-voxel spectroscopy data and introduces a structured evaluation framework for generative MRS methods, emphasizing the importance of application-aware validation when synthetic data are used for downstream analysis.

Abstract

The development of deep learning methods for magnetic resonance spectroscopy (MRS) is often hindered by limited availability of large, high-quality training datasets. While physics-based simulations are commonly used to mitigate this limitation, accurately modeling all in-vivo signal components remains challenging. In this work, we propose a data-driven framework for synthesizing in-vivo MRS data using a variational autoencoder (VAE) trained exclusively on measured single-voxel spectroscopy data. The model learns a low-dimensional latent representation of complex-valued spectra and enables generation of new samples through latent-space sampling and interpolation. The generative performance of the proposed approach is evaluated using a comprehensive set of complementary analyses, including reconstruction quality, feature-level similarity using low-dimensional embeddings, application-based signal quality metrics, and metabolite quantification agreement. The results demonstrate that the VAE accurately reconstructs dominant spectral patterns and generates synthetic spectra that occupy the same feature space as in-vivo data. In an example application targeting GABA-edited spectroscopy, augmenting limited subsets of transients with synthetic spectra improves signal quality metrics such as signal-to-noise ratio, linewidth, and shape scores. However, the results also reveal limitations of the generative approach, including under-representation of stochastic noise and reduced accuracy in absolute metabolite quantification, particularly for applications sensitive to concentration estimates. These findings highlight both potential and limitations of data-driven MRS synthesis. Beyond the proposed model, this study introduces a structured evaluation framework for generative MRS methods, emphasizing the importance of application-aware validation when synthetic data are used for downstream analysis.
Paper Structure (17 sections, 7 equations, 9 figures, 1 table)

This paper contains 17 sections, 7 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Overview of the used model architecture. Real and imaginary parts of the spectrum are concatenated and used as input for a fully connected nn
  • Figure 2: Overview of the data generation pipeline. Real, in vivo spectra are first encoded into latent space. New samples are then generated using one of three methods: random sampling, interpolation, or hybrid sampling. Finally, all generated samples are decoded back into spectra.
  • Figure 3: Overview of the application-based evaluation method. gt spectra from a single subject are used to select a subset of $n$ transients. These transients are encoded using the encoder nn and used for the generation of synthetic samples. The decoder nn generates synthetic transients to end up with the same number of transients as in the gt spectra. For the evaluation all three datasets are processed and fitted using Osprey.
  • Figure 4: Representative in-vivo spectra (blue) and corresponding reconstructions (orange) with residuals (black) for two subjects. Top panels show OFF and ON spectra with small, flat residuals, indicating accurate reconstruction. Bottom panels show another subject with slightly larger residuals, primarily around the water signal, while the overall spectral shapes are preserved.
  • Figure 5: umap visualizations comparing gt in-vivo spectra with synthetic spectra generated using three different data generation methods: random sampling (RS) (green), interpolation (red), and hybrid sampling (blue). The umap embedding is learned using the gt data, after which synthetic spectra generated from a single subset of transients are projected into the same low-dimensional space. Each point represents an individual spectrum, with markers indicating ON and OFF acquisitions. The shaded regions represent kernel density estimates of the embedded spectral distributions for both gt and synthetic data, illustrating the overall spatial extent and degree of overlap between datasets. Clusters corresponding to individual subjects are visible across all panels.
  • ...and 4 more figures