Reliable Multi-View Learning with Conformal Prediction for Aortic Stenosis Classification in Echocardiography
Ang Nan Gu, Michael Tsang, Hooman Vaseli, Teresa Tsang, Purang Abolmaesumi
TL;DR
This work tackles uncertainty in AS classification from echocardiography, where 2-D views of a 3-D anatomy can omit diagnostic details. It introduces Re-Training for Uncertainty (RT4U), a model-agnostic training approach that uses pseudo-labels derived from training dynamics to mitigate overfitting to noisy inputs, and couples RT4U with Conformal Prediction to produce adaptive prediction sets with guaranteed coverage. Across CIFAR-Q, TMED-2, and a private AS dataset, RT4U improves top-1 accuracy and calibration while yielding smaller, more informative prediction sets, preserving the marginal coverage property. The approach offers a practical, uncertainty-aware pathway for scalable, automated AS screening from standard echo data, bridging data-centric training with principled uncertainty quantification.
Abstract
The fundamental problem with ultrasound-guided diagnosis is that the acquired images are often 2-D cross-sections of a 3-D anatomy, potentially missing important anatomical details. This limitation leads to challenges in ultrasound echocardiography, such as poor visualization of heart valves or foreshortening of ventricles. Clinicians must interpret these images with inherent uncertainty, a nuance absent in machine learning's one-hot labels. We propose Re-Training for Uncertainty (RT4U), a data-centric method to introduce uncertainty to weakly informative inputs in the training set. This simple approach can be incorporated to existing state-of-the-art aortic stenosis classification methods to further improve their accuracy. When combined with conformal prediction techniques, RT4U can yield adaptively sized prediction sets which are guaranteed to contain the ground truth class to a high accuracy. We validate the effectiveness of RT4U on three diverse datasets: a public (TMED-2) and a private AS dataset, along with a CIFAR-10-derived toy dataset. Results show improvement on all the datasets.
