Table of Contents
Fetching ...

Cytoplasmic Strings Analysis in Human Embryo Time-Lapse Videos using Deep Learning Framework

Anabia Sohail, Mohamad Alansari, Ahmed Abughali, Asmaa Chehab, Abdelfatah Ahmed, Divya Velayudhan, Sajid Javed, Hasan Al Marzouqi, Ameena Saad Al-Sumaiti, Junaid Kashir, Naoufel Werghi

TL;DR

This paper tackles automated detection of cytoplasmic strings (CS) in human embryo time-lapse videos, a rare biomarker linked to improved viability. It introduces a human-in-the-loop annotation pipeline to curate a CS dataset and a two-stage framework that first classifies CS presence per frame and then localizes CS regions, guided by the Novel Uncertainty-aware Contractive Embedding (NUCE) loss. NUCE combines uncertainty-aware weighting with contractive embedding to address severe class imbalance and subtle CS patterns, yielding consistent gains across multiple transformer backbones. For localization, RF-DETR achieves state-of-the-art performance on detecting extremely thin, low-contrast CS structures, demonstrating the practical potential of automated CS assessment in embryo evaluation.

Abstract

Infertility is a major global health issue, and while in-vitro fertilization has improved treatment outcomes, embryo selection remains a critical bottleneck. Time-lapse imaging enables continuous, non-invasive monitoring of embryo development, yet most automated assessment methods rely solely on conventional morphokinetic features and overlook emerging biomarkers. Cytoplasmic Strings, thin filamentous structures connecting the inner cell mass and trophectoderm in expanded blastocysts, have been associated with faster blastocyst formation, higher blastocyst grades, and improved viability. However, CS assessment currently depends on manual visual inspection, which is labor-intensive, subjective, and severely affected by detection and subtle visual appearance. In this work, we present, to the best of our knowledge, the first computational framework for CS analysis in human IVF embryos. We first design a human-in-the-loop annotation pipeline to curate a biologically validated CS dataset from TLI videos, comprising 13,568 frames with highly sparse CS-positive instances. Building on this dataset, we propose a two-stage deep learning framework that (i) classifies CS presence at the frame level and (ii) localizes CS regions in positive cases. To address severe imbalance and feature uncertainty, we introduce the Novel Uncertainty-aware Contractive Embedding (NUCE) loss, which couples confidence-aware reweighting with an embedding contraction term to form compact, well-separated class clusters. NUCE consistently improves F1-score across five transformer backbones, while RF-DETR-based localization achieves state-of-the-art (SOTA) detection performance for thin, low-contrast CS structures. The source code will be made publicly available at: https://github.com/HamadYA/CS_Detection.

Cytoplasmic Strings Analysis in Human Embryo Time-Lapse Videos using Deep Learning Framework

TL;DR

This paper tackles automated detection of cytoplasmic strings (CS) in human embryo time-lapse videos, a rare biomarker linked to improved viability. It introduces a human-in-the-loop annotation pipeline to curate a CS dataset and a two-stage framework that first classifies CS presence per frame and then localizes CS regions, guided by the Novel Uncertainty-aware Contractive Embedding (NUCE) loss. NUCE combines uncertainty-aware weighting with contractive embedding to address severe class imbalance and subtle CS patterns, yielding consistent gains across multiple transformer backbones. For localization, RF-DETR achieves state-of-the-art performance on detecting extremely thin, low-contrast CS structures, demonstrating the practical potential of automated CS assessment in embryo evaluation.

Abstract

Infertility is a major global health issue, and while in-vitro fertilization has improved treatment outcomes, embryo selection remains a critical bottleneck. Time-lapse imaging enables continuous, non-invasive monitoring of embryo development, yet most automated assessment methods rely solely on conventional morphokinetic features and overlook emerging biomarkers. Cytoplasmic Strings, thin filamentous structures connecting the inner cell mass and trophectoderm in expanded blastocysts, have been associated with faster blastocyst formation, higher blastocyst grades, and improved viability. However, CS assessment currently depends on manual visual inspection, which is labor-intensive, subjective, and severely affected by detection and subtle visual appearance. In this work, we present, to the best of our knowledge, the first computational framework for CS analysis in human IVF embryos. We first design a human-in-the-loop annotation pipeline to curate a biologically validated CS dataset from TLI videos, comprising 13,568 frames with highly sparse CS-positive instances. Building on this dataset, we propose a two-stage deep learning framework that (i) classifies CS presence at the frame level and (ii) localizes CS regions in positive cases. To address severe imbalance and feature uncertainty, we introduce the Novel Uncertainty-aware Contractive Embedding (NUCE) loss, which couples confidence-aware reweighting with an embedding contraction term to form compact, well-separated class clusters. NUCE consistently improves F1-score across five transformer backbones, while RF-DETR-based localization achieves state-of-the-art (SOTA) detection performance for thin, low-contrast CS structures. The source code will be made publicly available at: https://github.com/HamadYA/CS_Detection.

Paper Structure

This paper contains 16 sections, 14 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Expanded blastocyst with CS (CS+ group) and without CS (CS- group). One CS is shown in the CS+ embryo traversing the blastocoel cavity (red arrows) and maintaining a connection between the ICM and the mural TE cells.
  • Figure 2: Performance comparison of baseline architectures using Cross-Entropy and the proposed NUCE loss. The Novel Uncertainty-aware Contractive Embedding (NUCE) objective consistently improves F1-score across all backbone models (ViT-B, Swin-B, and CLIP).
  • Figure 3: Overview of the annotation pipeline. A subset of time-lapse embryo videos is first manually annotated by expert embryologists, producing a verified data container used to train an automated detector. The trained auto-annotation model then generates cytoplasmic-string predictions for an unseen subset of time-lapse videos. Predicted annotations undergo verification, and all validated outputs are consolidated into a final verified dataset.
  • Figure 4: Overview of the proposed two-stage framework for CS detection in time-lapse human embryo videos. Stage 1: The classification network identifies CS presence through self-distillation. Stage 2: Localization network detects and localizes CS regions.
  • Figure 5: Comparison of learned embedding distributions across loss functions. The high-dimensional feature embeddings extracted from the ViT-B final layer are projected into a 2D space using Principal Component Analysis (PCA) to visualize the structural differences induced by each loss function.
  • ...and 3 more figures