Table of Contents
Fetching ...

SupReMix: Supervised Contrastive Learning for Medical Imaging Regression with Mixup

Yilei Wu, Zijian Dong, Chongyao Chen, Wangchunshu Zhou, Juan Helen Zhou

TL;DR

SupReMix tackles the challenge of learning robust representations for medical-imaging regression by introducing embedding-level mixup to create hard negatives and hard positives that encode ordinal relationships. A distance-magnifying, label-aware loss encourages continuous, globally ordered and locally linear representations, backed by theoretical guarantees. Empirically, SupReMix consistently outperforms classification-based and existing regression-focused contrastive methods across six diverse modalities (MRI, X-ray, ultrasound, PET) and tasks (brain age, bone age, ejection fraction, SUVR), including notable improvements on RSNA bone age (MAE drop from 6.79 to 4.08 months) and strong transferability and few-shot resilience. The approach also supports pretraining to boost task-specific models and enables gender-aware representations in bone-age assessment, underscoring its practical impact for robust, cross-site medical diagnostics.

Abstract

In medical image analysis, regression plays a critical role in computer-aided diagnosis. It enables quantitative measurements such as age prediction from structural imaging, cardiac function quantification, and molecular measurement from PET scans. While deep learning has shown promise for these tasks, most approaches focus solely on optimizing regression loss or model architecture, neglecting the quality of learned feature representations which are crucial for robust clinical predictions. Directly applying representation learning techniques designed for classification to regression often results in fragmented representations in the latent space, yielding sub-optimal performance. In this paper, we argue that the potential of contrastive learning for medical image regression has been overshadowed due to the neglect of two crucial aspects: ordinality-awareness and hardness. To address these challenges, we propose Supervised Contrastive Learning for Medical Imaging Regression with Mixup (SupReMix). It takes anchor-inclusive mixtures (mixup of the anchor and a distinct negative sample) as hard negative pairs and anchor-exclusive mixtures (mixup of two distinct negative samples) as hard positive pairs at the embedding level. This strategy formulates harder contrastive pairs by integrating richer ordinal information. Through theoretical analysis and extensive experiments on six datasets spanning MRI, X-ray, ultrasound, and PET modalities, we demonstrate that SupReMix fosters continuous ordered representations, significantly improving regression performance.

SupReMix: Supervised Contrastive Learning for Medical Imaging Regression with Mixup

TL;DR

SupReMix tackles the challenge of learning robust representations for medical-imaging regression by introducing embedding-level mixup to create hard negatives and hard positives that encode ordinal relationships. A distance-magnifying, label-aware loss encourages continuous, globally ordered and locally linear representations, backed by theoretical guarantees. Empirically, SupReMix consistently outperforms classification-based and existing regression-focused contrastive methods across six diverse modalities (MRI, X-ray, ultrasound, PET) and tasks (brain age, bone age, ejection fraction, SUVR), including notable improvements on RSNA bone age (MAE drop from 6.79 to 4.08 months) and strong transferability and few-shot resilience. The approach also supports pretraining to boost task-specific models and enables gender-aware representations in bone-age assessment, underscoring its practical impact for robust, cross-site medical diagnostics.

Abstract

In medical image analysis, regression plays a critical role in computer-aided diagnosis. It enables quantitative measurements such as age prediction from structural imaging, cardiac function quantification, and molecular measurement from PET scans. While deep learning has shown promise for these tasks, most approaches focus solely on optimizing regression loss or model architecture, neglecting the quality of learned feature representations which are crucial for robust clinical predictions. Directly applying representation learning techniques designed for classification to regression often results in fragmented representations in the latent space, yielding sub-optimal performance. In this paper, we argue that the potential of contrastive learning for medical image regression has been overshadowed due to the neglect of two crucial aspects: ordinality-awareness and hardness. To address these challenges, we propose Supervised Contrastive Learning for Medical Imaging Regression with Mixup (SupReMix). It takes anchor-inclusive mixtures (mixup of the anchor and a distinct negative sample) as hard negative pairs and anchor-exclusive mixtures (mixup of two distinct negative samples) as hard positive pairs at the embedding level. This strategy formulates harder contrastive pairs by integrating richer ordinal information. Through theoretical analysis and extensive experiments on six datasets spanning MRI, X-ray, ultrasound, and PET modalities, we demonstrate that SupReMix fosters continuous ordered representations, significantly improving regression performance.
Paper Structure (66 sections, 3 theorems, 19 equations, 16 figures, 14 tables)

This paper contains 66 sections, 3 theorems, 19 equations, 16 figures, 14 tables.

Key Result

Theorem 3.1

Given any two negative pairs (real or mixture), $s^{m,m'}_{i,j} := \mathbf{z}^T_{m,i}\cdot \mathbf{z}_{m',j}$, $s^{m,m"}_{i,l} :=\mathbf{z}^T_{m,i} \cdot \mathbf{z}_{m",l}$, where $m\neq m' \neq m"$, $|m'-m|>|m"-m|$, we always have $\nabla_{1}=\frac{\partial \mathcal{L}}{\partial s^{m,m'}_{i,j}}>0

Figures (16)

  • Figure 1: Overview of SupReMix framework. In pretraining (Panel A), the model is trained to learn task-specific representations ($z_{m,i}$) through mixup and contrast. In linear probing (Panel B), a linear regressor is trained to predict outcomes such as bone age, ejection fraction, amyloid SUVR, and brain age based on the learned representations. Example input modalities include hand X-ray, cardiac ultrasound, amyloid PET, and brain MRI (Panel C). SupReMix is designed to generalize across diverse medical imaging regression tasks.
  • Figure 2: Schematic overview of SupReMix method, and comparison with SupCon and RNC.A. An encoder first encodes inputs to embeddings. Given an anchor, Mix-neg are obtained through mixups ($\lambda_1 \sim \text{Beta}(\alpha, \beta)$) of the anchor itself and a negative in the latent space. Meanwhile, Mix-pos are derived from mixups ($\lambda_2$ is deterministic for two mixup embeddings) of two negative embeddings, the convex combination of whose labels equals to the anchor. B. SupCon identifies samples with the same label as positives and those with different labels as negatives, whereas RNC determines positives and negatives through a relative approach. SupReMix further refines this process by introducing hard positives and negatives alongside the conventional real ones. SupReMix holds a key advantage over RNC: it does not require input augmentation, which can be difficult when dealing with modalities such as time series.
  • Figure 3: Visualization (2D t-SNE map JMLR:v9:vandermaaten08a) of learned representations from RSNA dataset halabi2019rsna with genuine and permuted bone age labels.A: representations from genuine labels; B: representations from permuted labels. Our method produces continuous and ordered representations in the latent space that would be disrupted if the labels were permuted. In contrast, classification-based methods like SupCon create clusters regardless of whether the labels are genuine or permuted.
  • Figure 4: Comparison of average logits over training (epochs). SupCon exhibits early logit saturation due to exhausted contrastive pairs, while SupReMix maintains gradual improvement without early saturation. The logit values (left y-axis) and Pearson correlation (right y-axis) are tracked over 100 training epochs.
  • Figure 5: Mean Absolute Error (MAE) Comparisons Across Datasets. The figure above compares the MAE of six methods: Vanilla, SIMCLR, SupCon, AdaCon, RnC, and SupReMix, across six datasets: UK Biobank, HCP-Lifespan, Echo-Net, RSNA, RHPE, and A4. Statistical significance between methods, determined by paired t-tests, is annotated, where *** indicates $p < 0.001$, ** indicates $p < 0.01$, and ns denotes no significant difference. SupReMix demonstrates the lowest MAE across most datasets, showcasing superior performance on broad medical imaging regression tasks.
  • ...and 11 more figures

Theorems & Definitions (7)

  • Theorem 3.1: Distance Magnifying
  • Lemma 3.2: Lower bound
  • Theorem 3.3: Infimum
  • proof
  • proof
  • proof
  • proof