Table of Contents
Fetching ...

Merging synthetic and real embryo data for advanced AI predictions

Oriana Presacan, Alexandru Dorobantiu, Vajira Thambawita, Michael A. Riegler, Mette H. Stensen, Mario Iliceto, Alexandru C. Aldea, Akriti Sharma

TL;DR

This work trained two generative models using two datasets to generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst, which demonstrated that incorporating synthetic images alongside real data improved classification performance.

Abstract

Accurate embryo morphology assessment is essential in assisted reproductive technology for selecting the most viable embryo. Artificial intelligence has the potential to enhance this process. However, the limited availability of embryo data presents challenges for training deep learning models. To address this, we trained two generative models using two datasets-one we created and made publicly available, and one existing public dataset-to generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst. These were combined with real images to train classification models for embryo cell stage prediction. Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 94.5% when trained solely on real data. This trend remained consistent when tested on an external Blastocyst dataset from a different clinic. Notably, even when trained exclusively on synthetic data and tested on real data, the model achieved a high accuracy of 92%. Furthermore, combining synthetic data from both generative models yielded better classification results than using data from a single generative model. Four embryologists evaluated the fidelity of the synthetic images through a Turing test, during which they annotated inaccuracies and offered feedback. The analysis showed the diffusion model outperformed the generative adversarial network, deceiving embryologists 66.6% versus 25.3% and achieving lower Frechet inception distance scores.

Merging synthetic and real embryo data for advanced AI predictions

TL;DR

This work trained two generative models using two datasets to generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst, which demonstrated that incorporating synthetic images alongside real data improved classification performance.

Abstract

Accurate embryo morphology assessment is essential in assisted reproductive technology for selecting the most viable embryo. Artificial intelligence has the potential to enhance this process. However, the limited availability of embryo data presents challenges for training deep learning models. To address this, we trained two generative models using two datasets-one we created and made publicly available, and one existing public dataset-to generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst. These were combined with real images to train classification models for embryo cell stage prediction. Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 94.5% when trained solely on real data. This trend remained consistent when tested on an external Blastocyst dataset from a different clinic. Notably, even when trained exclusively on synthetic data and tested on real data, the model achieved a high accuracy of 92%. Furthermore, combining synthetic data from both generative models yielded better classification results than using data from a single generative model. Four embryologists evaluated the fidelity of the synthetic images through a Turing test, during which they annotated inaccuracies and offered feedback. The analysis showed the diffusion model outperformed the generative adversarial network, deceiving embryologists 66.6% versus 25.3% and achieving lower Frechet inception distance scores.

Paper Structure

This paper contains 16 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Pipeline of our proposed system, encompassing the training of generative models, the generation of synthetic data, the training of classification models, and the qualitative assessment conducted by embryologists.
  • Figure 2: Comparison of real versus synthetic images generated with StyleGAN and LDM models for each of the five embryo classes: 2-cell, 4-cell, 8-cell, morula, and blastocyst.
  • Figure 3: Classification accuracy trends on test data (100 real images) for the VGG model, trained with various combinations of synthetic images generated by LDM and StyleGAN models.
  • Figure 4: Classification accuracy trends on test data (100 real images) trained with various combinations of real and synthetic data generated by LDM and StyleGAN models.
  • Figure 5: Accuracy differences between training the VGG model from scratch versus using a pre-trained model, based on the same data combinations as in Figure 4.
  • ...and 3 more figures