Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning
Jia-Hao Xiao, Ming-Kun Xie, Heng-Bo Fan, Gang Niu, Masashi Sugiyama, Sheng-Jun Huang
TL;DR
This work tackles semi-supervised multi-label learning by addressing two intertwined challenges: generating high-quality pseudo-labels and selecting effective per-class thresholds. It introduces Dual-Decoupling Learning (D2L), which separates correlative and discriminative feature learning and decouples pseudo-label generation from utilization, and Metric-Adaptive Thresholding (MAT), which finds class-wise thresholds by maximizing a chosen metric on labeled data. Empirical results on VOC, COCO, and NUS demonstrate state-of-the-art gains across labeled-proportion regimes, with ablations confirming substantial contributions from both D2L components and MAT, especially when labeled data is scarce. The approach yields richer pseudo-labels and more reliable thresholds, improving label assignment under diverse true-label distributions and offering practical benefits for scalable SSMLL deployments.
Abstract
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations. Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance. To solve this problem, the mainstream method developed an effective thresholding strategy to generate accurate pseudo-labels. Unfortunately, the method neglected the quality of model predictions and its potential impact on pseudo-labeling performance. In this paper, we propose a dual-perspective method to generate high-quality pseudo-labels. To improve the quality of model predictions, we perform dual-decoupling to boost the learning of correlative and discriminative features, while refining the generation and utilization of pseudo-labels. To obtain proper class-wise thresholds, we propose the metric-adaptive thresholding strategy to estimate the thresholds, which maximize the pseudo-label performance for a given metric on labeled data. Experiments on multiple benchmark datasets show the proposed method can achieve the state-of-the-art performance and outperform the comparative methods with a significant margin.
