Table of Contents
Fetching ...

Semi-supervised Contrastive Learning Using Partial Label Information

Colin B. Hansen, Vishwesh Nath, Diego A. Mesa, Yuankai Huo, Bennett A. Landman, Thomas A. Lasko

TL;DR

The benefit of using partial label information using a careful comparison framework over well-characterized public datasets is investigated and it is shown that the additional information provided by partial labels reduces test error over good semi-supervised methods usually by a factor of 2, up to a factor of 5.5 in the best case.

Abstract

In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the label itself is missing. By encouraging the model to give the same label to all such examples through contrastive learning objectives, we can potentially improve its performance. We call this encouragement Nullspace Tuning because the difference vector between any pair of examples with the same label should lie in the nullspace of a linear model. In this paper, we investigate the benefit of using partial label information using a careful comparison framework over well-characterized public datasets. We show that the additional information provided by partial labels reduces test error over good semi-supervised methods usually by a factor of 2, up to a factor of 5.5 in the best case. We also show that adding Nullspace Tuning to the newer and state-of-the-art MixMatch method decreases its test error by up to a factor of 1.8.

Semi-supervised Contrastive Learning Using Partial Label Information

TL;DR

The benefit of using partial label information using a careful comparison framework over well-characterized public datasets is investigated and it is shown that the additional information provided by partial labels reduces test error over good semi-supervised methods usually by a factor of 2, up to a factor of 5.5 in the best case.

Abstract

In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the label itself is missing. By encouraging the model to give the same label to all such examples through contrastive learning objectives, we can potentially improve its performance. We call this encouragement Nullspace Tuning because the difference vector between any pair of examples with the same label should lie in the nullspace of a linear model. In this paper, we investigate the benefit of using partial label information using a careful comparison framework over well-characterized public datasets. We show that the additional information provided by partial labels reduces test error over good semi-supervised methods usually by a factor of 2, up to a factor of 5.5 in the best case. We also show that adding Nullspace Tuning to the newer and state-of-the-art MixMatch method decreases its test error by up to a factor of 1.8.

Paper Structure

This paper contains 19 sections, 13 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: In medical imaging repeat partial label information commonly comes in the form of repeat acquisitions of a subject. Assuming these acquisitions are acquired within a reasonable amount of time such that aging does not affect the anatomy, models can leverage the differences between acquisitions that may arise from differences in acquisition parameters. This may be differences in contrast such as the difference between T1 weighted MRI (top left) and T2 MRI (bottom left) or between non-contrast phase CT (top center-right) and portal venous phase CT (bottom center-right). The manufacturer of the imaging equipment may be a factor as well as is shown in the diffusion MRI fractional anisotropy (FA) estimated from a Prisma scanner (top center-left) and the FA estimated from a Connectom scanner (bottom center-left). Using repeat acquisitions with the same parameters and hardware can also provide useful information such as in repeat heart CT (right).
  • Figure 2: Samples from CIFAR-10 (left) and SVHN (right) are shown here. CIFAR-10 contains natural images of animals and vehicles and SVHN contains natural images of house numbers where the centered number is the one of interest.
  • Figure 3: We choose to use a Wide ResNet-28 architecture zagoruyko2016wide for our models. The layers highlighted with a red border are chosen to have feature maps visualized in Fig \ref{['layer_vis']}.
  • Figure 4: The added partial label information results in a substantial improvement in performance over baseline methods. This is shown in a percent test error and standard deviation (shaded region) comparison of Nullspace Tuning to baseline methods on CIFAR-10 (left) and SVHN (right) for a varied number of labeled data between 250 and 8,000. The largest improvement between Nullspace Tuning and the next best performing method (VAT) occurs in CIFAR-10 at 2,000 labeled data.
  • Figure 5: The additive performance of Nullspace Tuning on top of the state-of-the-art MixMatch algorithm is considerable. This is especially evident at 250 labeled data in CIFAR-10 where error is reduced by a factor of 1.8. This is shown in a percent test error and standard deviation (shaded regions) comparison of MixMatchNST to MixMatch on CIFAR-10 for a varying number of labels.
  • ...and 4 more figures