Table of Contents
Fetching ...

EchoDFKD: Data-Free Knowledge Distillation for Cardiac Ultrasound Segmentation using Synthetic Data

Grégoire Petit, Nathan Palluau, Axel Bauer, Clemens Dlaska

TL;DR

A model trained exclusively by knowledge distillation, either on real or synthetic data, involving retrieving masks suggested by a teacher model, achieves state-of-the-art (SOTA) values on the task of identifying end-diastolic and end-systolic frames.

Abstract

The application of machine learning to medical ultrasound videos of the heart, i.e., echocardiography, has recently gained traction with the availability of large public datasets. Traditional supervised tasks, such as ejection fraction regression, are now making way for approaches focusing more on the latent structure of data distributions, as well as generative methods. We propose a model trained exclusively by knowledge distillation, either on real or synthetical data, involving retrieving masks suggested by a teacher model. We achieve state-of-the-art (SOTA) values on the task of identifying end-diastolic and end-systolic frames. By training the model only on synthetic data, it reaches segmentation capabilities close to the performance when trained on real data with a significantly reduced number of weights. A comparison with the 5 main existing methods shows that our method outperforms the others in most cases. We also present a new evaluation method that does not require human annotation and instead relies on a large auxiliary model. We show that this method produces scores consistent with those obtained from human annotations. Relying on the integrated knowledge from a vast amount of records, this method overcomes certain inherent limitations of human annotator labeling. Code: https://github.com/GregoirePetit/EchoDFKD

EchoDFKD: Data-Free Knowledge Distillation for Cardiac Ultrasound Segmentation using Synthetic Data

TL;DR

A model trained exclusively by knowledge distillation, either on real or synthetic data, involving retrieving masks suggested by a teacher model, achieves state-of-the-art (SOTA) values on the task of identifying end-diastolic and end-systolic frames.

Abstract

The application of machine learning to medical ultrasound videos of the heart, i.e., echocardiography, has recently gained traction with the availability of large public datasets. Traditional supervised tasks, such as ejection fraction regression, are now making way for approaches focusing more on the latent structure of data distributions, as well as generative methods. We propose a model trained exclusively by knowledge distillation, either on real or synthetical data, involving retrieving masks suggested by a teacher model. We achieve state-of-the-art (SOTA) values on the task of identifying end-diastolic and end-systolic frames. By training the model only on synthetic data, it reaches segmentation capabilities close to the performance when trained on real data with a significantly reduced number of weights. A comparison with the 5 main existing methods shows that our method outperforms the others in most cases. We also present a new evaluation method that does not require human annotation and instead relies on a large auxiliary model. We show that this method produces scores consistent with those obtained from human annotations. Relying on the integrated knowledge from a vast amount of records, this method overcomes certain inherent limitations of human annotator labeling. Code: https://github.com/GregoirePetit/EchoDFKD
Paper Structure (29 sections, 8 equations, 14 figures, 3 tables)

This paper contains 29 sections, 8 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Overview of EchoDFKD. In Knowledge Distillation, real data, e.g., from EchoNet-Dynamic ouyang2020echonet, is often used to train a Teacher model and then used to train a Student model. EchoDFKD is using synthetic data from EchoNet-Synthetic dataset reynaud2024EchoNetsynthetic to distill knowledge. By analyzing the mask generated by our ConvLSTM-based EchoDFKD segmentator or the similarity outputs of custom EchoCLIP christensen2024echoclip prompts and EchoNet-Dynamic raw images, we can predict the average Frame Distance (aFD). Additionally, our EchoDFKD segmentator described in Subsec. \ref{['subsec:archi']}, is evaluated against EchoCLIP knowledge to assess its segmentation quality alongside traditional metrics (meanIoU, dice score) against human labels.
  • Figure 2: Segmentation examples from an EchoNet-Dynamic video [(a), (b)] and an EchoNet-Synthetic video [(c)]. In (a), EchoDFKD predictions are shown in the green (G) channel, while DeepLabv3 predictions are displayed in the red (R) channel. Image (b) is the original, unaltered EchoNet-Dynamic frame. Image (c) represents a segmentation mask of DeepLabv3 on the EchoNet-Synthetic dataset used to train EchoDFKD.
  • Figure 3: Illustration of the relationship between the number of model parameters and three key performance metrics: mean Intersection over Union (meanIoU), Dice score, and our custom EchoCLIP score. The meanIoU and Dice score, displayed on the left y-axis, show how segmentation accuracy against human annotators improves with increased model complexity. Our EchoCLIP score, shown on the right y-axis, reflects the segmentation quality without needing any annotator. In the EchoCLIP segmentation quality assessment, the segmentation quality is determined by the difference between the prompts "LEFT VENTRICLE" and "NOTHING" applied on raw masks that have been expanded by a few pixels.
  • Figure 4: Relationship between sampling rate and mean aFD error for EchoDFKD: This plot shows the mean aFD (average Frame Distance) error of the EchoNet-Dynamic dataset as a function of the sampling rate used for EchoDFKD.
  • Figure 5: Comparison of DeepLabv3 masks area and prompts similarities. f is the cumulative sum of the difference between the prompts "THE MITRAL VALVE IS CLOSED" and "THE MITRAL VALVE IS OPEN" with the linear trend removed.
  • ...and 9 more figures