Table of Contents
Fetching ...

Learning to Score

Yogev Kriger, Shai Fine

TL;DR

This work tackles scoring problems where target labels are unavailable but side information is available, proposing a three-component framework that blends representation learning, information bottleneck regularization, and metric learning. By organizing the model around a shared latent space and integrating side information through a mutual-information objective and a distribution-aware triplet loss, the approach yields discriminative scores without requiring target labels during training. Empirical results on MNIST, Parkinson's disease data, and student-performance datasets show the method can match or closely approach supervised baselines and produce meaningful latent structures aligned with side information. The framework holds promise for applications across domains where expert-informed side information is available but labels are scarce or incomplete, with future work pointing toward adversarial enhancements and broader data deployments.

Abstract

Common machine learning settings range from supervised tasks, where accurately labeled data is accessible, through semi-supervised and weakly-supervised tasks, where target labels are scant or noisy, to unsupervised tasks where labels are unobtainable. In this paper we study a scenario where the target labels are not available but additional related information is at hand. This information, referred to as Side Information, is either correlated with the unknown labels or imposes constraints on the feature space. We formulate the problem as an ensemble of three semantic components: representation learning, side information and metric learning. The proposed scoring model is advantageous for multiple use-cases. For example, in the healthcare domain it can be used to create a severity score for diseases where the symptoms are known but the criteria for the disease progression are not well defined. We demonstrate the utility of the suggested scoring system on well-known benchmark data-sets and bio-medical patient records.

Learning to Score

TL;DR

This work tackles scoring problems where target labels are unavailable but side information is available, proposing a three-component framework that blends representation learning, information bottleneck regularization, and metric learning. By organizing the model around a shared latent space and integrating side information through a mutual-information objective and a distribution-aware triplet loss, the approach yields discriminative scores without requiring target labels during training. Empirical results on MNIST, Parkinson's disease data, and student-performance datasets show the method can match or closely approach supervised baselines and produce meaningful latent structures aligned with side information. The framework holds promise for applications across domains where expert-informed side information is available but labels are scarce or incomplete, with future work pointing toward adversarial enhancements and broader data deployments.

Abstract

Common machine learning settings range from supervised tasks, where accurately labeled data is accessible, through semi-supervised and weakly-supervised tasks, where target labels are scant or noisy, to unsupervised tasks where labels are unobtainable. In this paper we study a scenario where the target labels are not available but additional related information is at hand. This information, referred to as Side Information, is either correlated with the unknown labels or imposes constraints on the feature space. We formulate the problem as an ensemble of three semantic components: representation learning, side information and metric learning. The proposed scoring model is advantageous for multiple use-cases. For example, in the healthcare domain it can be used to create a severity score for diseases where the symptoms are known but the criteria for the disease progression are not well defined. We demonstrate the utility of the suggested scoring system on well-known benchmark data-sets and bio-medical patient records.

Paper Structure

This paper contains 23 sections, 13 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: The pattern assumes that the target function $f~=~\mathcal{\psi}~\circ~\mathcal{\phi}$ and the related function $g~=~\mathcal{\beta}~\circ~\mathcal{\phi}$ share $\mathcal{\phi}$, and therefore have the same intermediate representation $z = \mathcal{\phi}(x)$. By training the representation to predict both $y$ using $\mathcal{\psi}$, and $s$ using the auxiliary function $\mathcal{\beta} : z \rightarrow s$, we incorporate the assumption that related tasks share intermediate representations. jrs:15
  • Figure 2: The model's architecture - composed of three parts: reconstruction, side information and score inference.
  • Figure 3: VAE - Latent space visualization of the test images.
  • Figure 4: Triplet loss - Latent space visualization of the test images.
  • Figure 5: VAE & Triplet loss - Latent space visualization of the test images.
  • ...and 6 more figures