Table of Contents
Fetching ...

Semi-Supervised Multimodal Multi-Instance Learning for Aortic Stenosis Diagnosis

Zhe Huang, Xiaowei Yu, Benjamin S. Wessler, Michael C. Hughes

TL;DR

This work introduces Semi-supervised Multimodal Multiple-Instance Learning (SMMIL), a new deep learning framework for automatic interpretation for structural heart diseases like AS that outperforms recent alternatives, including two medical foundation models.

Abstract

Automated interpretation of ultrasound imaging of the heart (echocardiograms) could improve the detection and treatment of aortic stenosis (AS), a deadly heart disease. However, existing deep learning pipelines for assessing AS from echocardiograms have two key limitations. First, most methods rely on limited 2D cineloops, thereby ignoring widely available Doppler imaging that contains important complementary information about pressure gradients and blood flow abnormalities associated with AS. Second, obtaining labeled data is difficult. There are often far more unlabeled echocardiogram recordings available, but these remain underutilized by existing methods. To overcome these limitations, we introduce Semi-supervised Multimodal Multiple-Instance Learning (SMMIL), a new deep learning framework for automatic interpretation for structural heart diseases like AS. When deployed, SMMIL can combine information from two input modalities, spectral Dopplers and 2D cineloops, to produce a study-level AS diagnosis. During training, SMMIL can combine a smaller labeled set and an abundant unlabeled set of both modalities to improve its classifier. Experiments demonstrate that SMMIL outperforms recent alternatives at 3-level AS severity classification as well as several clinically relevant AS detection tasks.

Semi-Supervised Multimodal Multi-Instance Learning for Aortic Stenosis Diagnosis

TL;DR

This work introduces Semi-supervised Multimodal Multiple-Instance Learning (SMMIL), a new deep learning framework for automatic interpretation for structural heart diseases like AS that outperforms recent alternatives, including two medical foundation models.

Abstract

Automated interpretation of ultrasound imaging of the heart (echocardiograms) could improve the detection and treatment of aortic stenosis (AS), a deadly heart disease. However, existing deep learning pipelines for assessing AS from echocardiograms have two key limitations. First, most methods rely on limited 2D cineloops, thereby ignoring widely available Doppler imaging that contains important complementary information about pressure gradients and blood flow abnormalities associated with AS. Second, obtaining labeled data is difficult. There are often far more unlabeled echocardiogram recordings available, but these remain underutilized by existing methods. To overcome these limitations, we introduce Semi-supervised Multimodal Multiple-Instance Learning (SMMIL), a new deep learning framework for automatic interpretation for structural heart diseases like AS. When deployed, SMMIL can combine information from two input modalities, spectral Dopplers and 2D cineloops, to produce a study-level AS diagnosis. During training, SMMIL can combine a smaller labeled set and an abundant unlabeled set of both modalities to improve its classifier. Experiments demonstrate that SMMIL outperforms recent alternatives at 3-level AS severity classification as well as several clinically relevant AS detection tasks.
Paper Structure (8 sections, 5 equations, 3 figures, 4 tables)

This paper contains 8 sections, 5 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of SMMIL. Top: Illustration of the SSL training workflow. In the first iteration, the model is trained on the labeled set. In all subsequent iterations, the model is trained on the union of the labeled set and the selected unlabeled subset by model from the previous iteration. This process repeats until the stopping criteria is met. Bottom: Illustration of the Multimodal Multiple-Instance network. The network process each 2D cineloops and spectral Dopplers in a bag into feature embedding, with subsequent attention pooling operations to synthesize all information into a cohesive bag representation. Distinct feature extractors are used for 2D and spectral Dopplers branch to accommodate their unique characteristics.
  • Figure 2: Confusion Matrices for AS severity classification task across three predefined train/test splits of TMED-2.
  • Figure 3: t-SNE visualization of TTE study representation on 3 test splits of TMED-2. The patient representations form noticeable clusters for different AS severity levels.