Table of Contents
Fetching ...

Contrastive Learning for Semi-Supervised Deep Regression with Generalized Ordinal Rankings from Spectral Seriation

Ce Wang, Weihang Dai, Hanru Bai, Xiaomeng Li

TL;DR

The paper tackles semi-supervised deep regression by enabling unlabeled data to contribute to learning of ordinal relationships through spectral seriation. It introduces Generalized Contrastive Learning with Spectral Seriation (GCLSS), which constructs a feature similarity matrix from both labeled and unlabeled samples, derives robust unlabeled rankings via a generalized seriation algorithm, and stabilizes learning with a memory-based feature selection module. The method is backed by theoretical robustness guarantees and validated across diverse domains (medical MRI, synthetic operators, natural images, and audio), consistently outperforming state-of-the-art SSL regression methods. This work reduces annotated data requirements while delivering improved representation learning and regression accuracy, with public code available.

Abstract

Contrastive learning methods enforce label distance relationships in feature space to improve representation capability for regression models. However, these methods highly depend on label information to correctly recover ordinal relationships of features, limiting their applications to semi-supervised regression. In this work, we extend contrastive regression methods to allow unlabeled data to be used in the semi-supervised setting, thereby reducing the dependence on costly annotations. Particularly we construct the feature similarity matrix with both labeled and unlabeled samples in a mini-batch to reflect inter-sample relationships, and an accurate ordinal ranking of involved unlabeled samples can be recovered through spectral seriation algorithms if the level of error is within certain bounds. The introduction of labeled samples above provides regularization of the ordinal ranking with guidance from the ground-truth label information, making the ranking more reliable. To reduce feature perturbations, we further utilize the dynamic programming algorithm to select robust features for the matrix construction. The recovered ordinal relationship is then used for contrastive learning on unlabeled samples, and we thus allow more data to be used for feature representation learning, thereby achieving more robust results. The ordinal rankings can also be used to supervise predictions on unlabeled samples, serving as an additional training signal. We provide theoretical guarantees and empirical verification through experiments on various datasets, demonstrating that our method can surpass existing state-of-the-art semi-supervised deep regression methods. Our code have been released on https://github.com/xmed-lab/CLSS.

Contrastive Learning for Semi-Supervised Deep Regression with Generalized Ordinal Rankings from Spectral Seriation

TL;DR

The paper tackles semi-supervised deep regression by enabling unlabeled data to contribute to learning of ordinal relationships through spectral seriation. It introduces Generalized Contrastive Learning with Spectral Seriation (GCLSS), which constructs a feature similarity matrix from both labeled and unlabeled samples, derives robust unlabeled rankings via a generalized seriation algorithm, and stabilizes learning with a memory-based feature selection module. The method is backed by theoretical robustness guarantees and validated across diverse domains (medical MRI, synthetic operators, natural images, and audio), consistently outperforming state-of-the-art SSL regression methods. This work reduces annotated data requirements while delivering improved representation learning and regression accuracy, with public code available.

Abstract

Contrastive learning methods enforce label distance relationships in feature space to improve representation capability for regression models. However, these methods highly depend on label information to correctly recover ordinal relationships of features, limiting their applications to semi-supervised regression. In this work, we extend contrastive regression methods to allow unlabeled data to be used in the semi-supervised setting, thereby reducing the dependence on costly annotations. Particularly we construct the feature similarity matrix with both labeled and unlabeled samples in a mini-batch to reflect inter-sample relationships, and an accurate ordinal ranking of involved unlabeled samples can be recovered through spectral seriation algorithms if the level of error is within certain bounds. The introduction of labeled samples above provides regularization of the ordinal ranking with guidance from the ground-truth label information, making the ranking more reliable. To reduce feature perturbations, we further utilize the dynamic programming algorithm to select robust features for the matrix construction. The recovered ordinal relationship is then used for contrastive learning on unlabeled samples, and we thus allow more data to be used for feature representation learning, thereby achieving more robust results. The ordinal rankings can also be used to supervise predictions on unlabeled samples, serving as an additional training signal. We provide theoretical guarantees and empirical verification through experiments on various datasets, demonstrating that our method can surpass existing state-of-the-art semi-supervised deep regression methods. Our code have been released on https://github.com/xmed-lab/CLSS.

Paper Structure

This paper contains 29 sections, 6 theorems, 19 equations, 2 figures, 11 tables, 1 algorithm.

Key Result

Theorem 3.1

Given similarity matrix $\mathcal{S}'$, such that ${\mathcal{S}}_{[i,j]}' > {\mathcal{S}}_{[i,k]}'$ for $| {y}_{i} - {y}_{j} | < |{y}_{i} - {y}_{k} |$, the ordinal ranking that best satisfies the observed $\mathcal{S}'$ is the ranking of the values in the Fiedler vector of $\mathbf{L}'$, where $\mat

Figures (2)

  • Figure 1: The framework of our proposed GCLSS method, where $({x}_{i}, {y}_{i})$ and $({x}_{i}^{'})$ are labeled pairs and unlabeled samples, respectively, ${\tilde{z}}_{i}$ and ${\tilde{z}}_{i}^{'}$ are the extracted features, and ${\hat{y}}_{i}$ and ${\hat{y}}_{i}^{'}$ correspond to the predictions from labeled and unlabeled samples. Unlike existing contrastive regression works that are only able to employ labeled data to construct supervised regression loss ${\cal{L}}_{SR}$ and supervised contrastive loss ${\cal{L}}_{SC}$, we make use of the spectral seriation to obtain ordinal rankings ${\cal{R}}^{'}$ from unlabeled samples. This can be used for constructing the unsupervised contrastive loss ${\cal{L}}_{UC}$ which serves as contrastive learning and ranking supervision for unlabeled samples.
  • Figure 2: Visualizations of sample data from IXI, AgeDB-DIR, and UTKFace datasets for age estimation. Images are shown with paired age labels, which is used for training.

Theorems & Definitions (6)

  • Theorem 3.1
  • Corollary 3.2
  • Lemma 3.3
  • Lemma 3.4
  • Theorem 3.5
  • Corollary 3.6