Expectation Maximization Pseudo Labels

Moucheng Xu; Yukun Zhou; Chen Jin; Marius de Groot; Daniel C. Alexander; Neil P. Oxtoby; Yipeng Hu; Joseph Jacob

Expectation Maximization Pseudo Labels

Moucheng Xu, Yukun Zhou, Chen Jin, Marius de Groot, Daniel C. Alexander, Neil P. Oxtoby, Yipeng Hu, Joseph Jacob

TL;DR

This paper presents a full generalisation of pseudo-labels under Bayes' theorem, termed Bayesian Pseudo Labels, and showcases the applications of pseudo-labelling and its generalised form, Bayesian Pseudo-Labelling, in the semi-supervised segmentation of medical images.

Abstract

In this paper, we study pseudo-labelling. Pseudo-labelling employs raw inferences on unlabelled data as pseudo-labels for self-training. We elucidate the empirical successes of pseudo-labelling by establishing a link between this technique and the Expectation Maximisation algorithm. Through this, we realise that the original pseudo-labelling serves as an empirical estimation of its more comprehensive underlying formulation. Following this insight, we present a full generalisation of pseudo-labels under Bayes' theorem, termed Bayesian Pseudo Labels. Subsequently, we introduce a variational approach to generate these Bayesian Pseudo Labels, involving the learning of a threshold to automatically select high-quality pseudo labels. In the remainder of the paper, we showcase the applications of pseudo-labelling and its generalised form, Bayesian Pseudo-Labelling, in the semi-supervised segmentation of medical images. Specifically, we focus on: 1) 3D binary segmentation of lung vessels from CT volumes; 2) 2D multi-class segmentation of brain tumours from MRI volumes; 3) 3D binary segmentation of whole brain tumours from MRI volumes; and 4) 3D binary segmentation of prostate from MRI volumes. We further demonstrate that pseudo-labels can enhance the robustness of the learned representations. The code is released in the following GitHub repository: https://github.com/moucheng2017/EMSSL

Expectation Maximization Pseudo Labels

TL;DR

Abstract

Paper Structure (25 sections, 21 equations, 10 figures, 6 tables)

This paper contains 25 sections, 21 equations, 10 figures, 6 tables.

Introduction
Semi-supervised learning and Entropy regularisation
Consistency Regularisation
Pseudo Labelling
Motivations and contributions
Related works
Pseudo Labelling As Expectation-Maximization
Problem formulation
Pseudo labels as latent variables
E-M Pseudo Labelling
On the convergence of Pseudo Labelling from the perspective of EM
Generalisation of Pseudo Labels via Variational Inference for Segmentation
Confidence threshold as latent variable
Variational E-step
Experimental Results
...and 10 more sections

Figures (10)

Figure 1: Pseudo-labelling process for binary segmentation. Pseudo-label $y'_n$ is generated using unlabelled data $x_u$ and model with parameters from last iteration $\theta$. Therefore, pseudo-labelling can be seen as the E-step in Expecation-Maximization. The M-step updates $\theta$ using $y'_n$, $y$ and data $X$. In our 1st implementation, namely SegPL, the threshold $T$ is fixed for selecting the pseudo labels, which is the original pseudo labelling, as an empirical approximation of its true generalisation. In our 2nd implementation, namely SegPL-VI, the threshold $T$ is dynamic and learnt via variational inference, which is an learnt approximation of its true generalisation.
Figure 2: The implementation of the proposed Bayesian Pseudo Labels. Only unsupervised learning part is illustrated.
Figure 3: SegPL statistically outperforms the best performing baseline CPS when trained on 2 labelled volumes from the CARVE dataset. Each data point represents a single testing image.
Figure 4: Y-axis: Learnt threshold in the experiment of Task01 Brain Tumour. X-axis: training iterations. The mean of the prior is 0.9 and the std of the prior is 0.1. The learnt threshold converged around 0.82 after 2000 iterations.
Figure 5: Y-axis: Learnt threshold in the experiment of Task05 Prostate. X-axis: training iterations. The mean of the prior is 0.9 and the std of the prior is 0.1. The learnt threshold converged around 0.785 after 2000 iterations.
...and 5 more figures

Expectation Maximization Pseudo Labels

TL;DR

Abstract

Expectation Maximization Pseudo Labels

Authors

TL;DR

Abstract

Table of Contents

Figures (10)