Table of Contents
Fetching ...

Multimodal Learning for Embryo Viability Prediction in Clinical IVF

Junsik Kim, Zhiyi Shi, Davin Jeong, Johannes Knittel, Helen Y. Yang, Yonghyun Song, Wanhua Li, Yicong Li, Dalit Ben-Yosef, Daniel Needleman, Hanspeter Pfister

TL;DR

A multimodal model is developed that leverages both time-lapse video data and Electronic Health Records (EHRs) to predict embryo viability and will enable fast and automated embryo viability predictions in scale for clinical IVF.

Abstract

In clinical In-Vitro Fertilization (IVF), identifying the most viable embryo for transfer is important to increasing the likelihood of a successful pregnancy. Traditionally, this process involves embryologists manually assessing embryos' static morphological features at specific intervals using light microscopy. This manual evaluation is not only time-intensive and costly, due to the need for expert analysis, but also inherently subjective, leading to variability in the selection process. To address these challenges, we develop a multimodal model that leverages both time-lapse video data and Electronic Health Records (EHRs) to predict embryo viability. One of the primary challenges of our research is to effectively combine time-lapse video and EHR data, owing to their inherent differences in modality. We comprehensively analyze our multimodal model with various modality inputs and integration approaches. Our approach will enable fast and automated embryo viability predictions in scale for clinical IVF.

Multimodal Learning for Embryo Viability Prediction in Clinical IVF

TL;DR

A multimodal model is developed that leverages both time-lapse video data and Electronic Health Records (EHRs) to predict embryo viability and will enable fast and automated embryo viability predictions in scale for clinical IVF.

Abstract

In clinical In-Vitro Fertilization (IVF), identifying the most viable embryo for transfer is important to increasing the likelihood of a successful pregnancy. Traditionally, this process involves embryologists manually assessing embryos' static morphological features at specific intervals using light microscopy. This manual evaluation is not only time-intensive and costly, due to the need for expert analysis, but also inherently subjective, leading to variability in the selection process. To address these challenges, we develop a multimodal model that leverages both time-lapse video data and Electronic Health Records (EHRs) to predict embryo viability. One of the primary challenges of our research is to effectively combine time-lapse video and EHR data, owing to their inherent differences in modality. We comprehensively analyze our multimodal model with various modality inputs and integration approaches. Our approach will enable fast and automated embryo viability predictions in scale for clinical IVF.

Paper Structure

This paper contains 15 sections, 5 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of our multimodal model. Video data is first tokenized into patches per frame. Then, the spatial transformer encodes per frame embeddings. The Multimodal transformer inputs both frame embeddings and an EHR embedding to output a multimodal feature. Lastly, the MLP head predicts embryo viability based on the multimodal feature. If additional inputs in the form of video or tabular are available, such as outputs from Embryo-vision leahy2020automated or BlastAssist yang2024blastassist, they are processed in a similar manner as the video input and the EHR input respectively.
  • Figure 2: Overview of the two-stage approach. First, morphological features $\mathbf{v}'$ are extracted from videos using leahy2020automated. Then, the extracted features $\mathbf{v}'$ are converted to interpretable features $\mathbf{e'}$ in tabular format using yang2024blastassist. Lastly, the tabular model inputs EHRs $\mathbf{e}$ and interpretable features $\mathbf{e'}$ to predict embryo viability.