Table of Contents
Fetching ...

Efficient endometrial carcinoma screening via cross-modal synthesis and gradient distillation

Dongjing Shan, Yamei Luo, Jiqing Xuan, Lu Huang, Jin Li, Mengchu Yang, Zeyu Chen, Fajin Lv, Yong Tang, Chunxiang Zhang

TL;DR

This work presents an automated, highly efficient two-stage deep learning framework that resolves both data and computational bottlenecks in EC screening and develops a structure-guided cross-modal generation network that synthesizes diverse, high-fidelity ultrasound images from unpaired magnetic resonance imaging data, strictly preserving clinically essential anatomical junctions.

Abstract

Early detection of myometrial invasion is critical for the staging and life-saving management of endometrial carcinoma (EC), a prevalent global malignancy. Transvaginal ultrasound serves as the primary, accessible screening modality in resource-constrained primary care settings; however, its diagnostic reliability is severely hindered by low tissue contrast, high operator dependence, and a pronounced scarcity of positive pathological samples. Existing artificial intelligence solutions struggle to overcome this severe class imbalance and the subtle imaging features of invasion, particularly under the strict computational limits of primary care clinics. Here we present an automated, highly efficient two-stage deep learning framework that resolves both data and computational bottlenecks in EC screening. To mitigate pathological data scarcity, we develop a structure-guided cross-modal generation network that synthesizes diverse, high-fidelity ultrasound images from unpaired magnetic resonance imaging (MRI) data, strictly preserving clinically essential anatomical junctions. Furthermore, we introduce a lightweight screening network utilizing gradient distillation, which transfers discriminative knowledge from a high-capacity teacher model to dynamically guide sparse attention towards task-critical regions. Evaluated on a large, multicenter cohort of 7,951 participants, our model achieves a sensitivity of 99.5\%, a specificity of 97.2\%, and an area under the curve of 0.987 at a minimal computational cost (0.289 GFLOPs), substantially outperforming the average diagnostic accuracy of expert sonographers. Our approach demonstrates that combining cross-modal synthetic augmentation with knowledge-driven efficient modeling can democratize expert-level, real-time cancer screening for resource-constrained primary care settings.

Efficient endometrial carcinoma screening via cross-modal synthesis and gradient distillation

TL;DR

This work presents an automated, highly efficient two-stage deep learning framework that resolves both data and computational bottlenecks in EC screening and develops a structure-guided cross-modal generation network that synthesizes diverse, high-fidelity ultrasound images from unpaired magnetic resonance imaging data, strictly preserving clinically essential anatomical junctions.

Abstract

Early detection of myometrial invasion is critical for the staging and life-saving management of endometrial carcinoma (EC), a prevalent global malignancy. Transvaginal ultrasound serves as the primary, accessible screening modality in resource-constrained primary care settings; however, its diagnostic reliability is severely hindered by low tissue contrast, high operator dependence, and a pronounced scarcity of positive pathological samples. Existing artificial intelligence solutions struggle to overcome this severe class imbalance and the subtle imaging features of invasion, particularly under the strict computational limits of primary care clinics. Here we present an automated, highly efficient two-stage deep learning framework that resolves both data and computational bottlenecks in EC screening. To mitigate pathological data scarcity, we develop a structure-guided cross-modal generation network that synthesizes diverse, high-fidelity ultrasound images from unpaired magnetic resonance imaging (MRI) data, strictly preserving clinically essential anatomical junctions. Furthermore, we introduce a lightweight screening network utilizing gradient distillation, which transfers discriminative knowledge from a high-capacity teacher model to dynamically guide sparse attention towards task-critical regions. Evaluated on a large, multicenter cohort of 7,951 participants, our model achieves a sensitivity of 99.5\%, a specificity of 97.2\%, and an area under the curve of 0.987 at a minimal computational cost (0.289 GFLOPs), substantially outperforming the average diagnostic accuracy of expert sonographers. Our approach demonstrates that combining cross-modal synthetic augmentation with knowledge-driven efficient modeling can democratize expert-level, real-time cancer screening for resource-constrained primary care settings.
Paper Structure (30 sections, 23 equations, 10 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 23 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: Visual comparison of MRI-to-ultrasound image synthesis results from different models. (a) Original axial MR images of the uterus. (b-f) Corresponding synthetic ultrasound images generated by five comparative models, respectively.
  • Figure 2: Visualization of selected feature maps from the modality-agnostic feature extractor (MAFE).
  • Figure 3: Receiver operating characteristic (ROC) curves of the downstream invasion classifier trained on synthetic data. The lightweight MobileNet-V2 classifier was trained using synthetic ultrasound images generated by (a) CycleGAN, (b) UNIT, (c) MUNIT, (d) DCLGAN, and (e) the proposed SG-CycleGAN.
  • Figure 4: Comparison of receiver operating characteristic (ROC) curves among lightweight models for myometrial invasion classification.
  • Figure 5: Boxplots comparing Bootstrap distributions of performance metrics across three independent experiments. Each box shows the median (center line), interquartile range (box bounds), and range (whiskers) for Accuracy, Sensitivity, Specificity, Precision, F1-score, and ROC AUC.
  • ...and 5 more figures