Table of Contents
Fetching ...

Rethinking Self-training for Semi-supervised Landmark Detection: A Selection-free Approach

Haibo Jin, Haoxuan Che, Hao Chen

TL;DR

Self-Training for Landmark Detection (STLD), a method that does not require explicit pseudo-label selection, constructs a task curriculum to deal with confirmation bias, which progressively transitions from more confident to less confident tasks over the rounds of self-training.

Abstract

Self-training is a simple yet effective method for semi-supervised learning, during which pseudo-label selection plays an important role for handling confirmation bias. Despite its popularity, applying self-training to landmark detection faces three problems: 1) The selected confident pseudo-labels often contain data bias, which may hurt model performance; 2) It is not easy to decide a proper threshold for sample selection as the localization task can be sensitive to noisy pseudo-labels; 3) coordinate regression does not output confidence, making selection-based self-training infeasible. To address the above issues, we propose Self-Training for Landmark Detection (STLD), a method that does not require explicit pseudo-label selection. Instead, STLD constructs a task curriculum to deal with confirmation bias, which progressively transitions from more confident to less confident tasks over the rounds of self-training. Pseudo pretraining and shrink regression are two essential components for such a curriculum, where the former is the first task of the curriculum for providing a better model initialization and the latter is further added in the later rounds to directly leverage the pseudo-labels in a coarse-to-fine manner. Experiments on three facial and one medical landmark detection benchmark show that STLD outperforms the existing methods consistently in both semi- and omni-supervised settings. The code is available at https://github.com/jhb86253817/STLD.

Rethinking Self-training for Semi-supervised Landmark Detection: A Selection-free Approach

TL;DR

Self-Training for Landmark Detection (STLD), a method that does not require explicit pseudo-label selection, constructs a task curriculum to deal with confirmation bias, which progressively transitions from more confident to less confident tasks over the rounds of self-training.

Abstract

Self-training is a simple yet effective method for semi-supervised learning, during which pseudo-label selection plays an important role for handling confirmation bias. Despite its popularity, applying self-training to landmark detection faces three problems: 1) The selected confident pseudo-labels often contain data bias, which may hurt model performance; 2) It is not easy to decide a proper threshold for sample selection as the localization task can be sensitive to noisy pseudo-labels; 3) coordinate regression does not output confidence, making selection-based self-training infeasible. To address the above issues, we propose Self-Training for Landmark Detection (STLD), a method that does not require explicit pseudo-label selection. Instead, STLD constructs a task curriculum to deal with confirmation bias, which progressively transitions from more confident to less confident tasks over the rounds of self-training. Pseudo pretraining and shrink regression are two essential components for such a curriculum, where the former is the first task of the curriculum for providing a better model initialization and the latter is further added in the later rounds to directly leverage the pseudo-labels in a coarse-to-fine manner. Experiments on three facial and one medical landmark detection benchmark show that STLD outperforms the existing methods consistently in both semi- and omni-supervised settings. The code is available at https://github.com/jhb86253817/STLD.
Paper Structure (29 sections, 12 equations, 15 figures, 6 tables, 1 algorithm)

This paper contains 29 sections, 12 equations, 15 figures, 6 tables, 1 algorithm.

Figures (15)

  • Figure 1: Self-Training for Landmark Detection (STLD). (1) Model is first trained on the labeled data with supervised learning, (2) then estimates pseudo-labels of unlabeled data, and (3) is retrained on both labeled and pseudo-labeled data with the constructed task curriculum.
  • Figure 2: (a) Visualized label density maps of four data groups from 300W STZ13. The labels (i.e., coordinates) are mapped to the $256\times256$ map, and plotted in density maps with 12 bins at each axis. (b) The KL divergence of unlabeled GT and three data groups respectively: 1) labeled data, 2) confident pseudo-labeles, and 3) all the pseudo-labels, calculated based on the label density maps over 300W STZ13. Both the average distance and individual distance of randomly selected landmarks are plotted.
  • Figure 3: Comparison of selection-based and selection-free methods over the rounds, trained on 300W STZ13 with 5% labeled. (a) Compare on test performance. (b) Compare on the noise of the estimated pseudo-labels.
  • Figure 4: 2D histograms of the offsets of pseudo-labels relative to GTs, trained on 300W with different labeled ratios. We analyzed both heatmap ((a)-(b)) and coordinate ((c)-(d)) models.
  • Figure 5: Architecture of the two base models for landmark detection. (a) Heatmap regression. (b) Coordinate regression with transformer decoder as the head.
  • ...and 10 more figures