Pushing One Pair of Labels Apart Each Time in Multi-Label Learning: From Single Positive to Full Labels
Xiang Li, Xinrui Wang, Songcan Chen
TL;DR
This work addresses robust multi-label learning when only a single positive label is observed per sample (SPMLL) or full labels are unavailable. It introduces OPML, a unified loss that updates only one pair of labels at a time via a log-sum-exp-based objective, preventing domination by negative labels and extending naturally to full MLL. The approach is augmented with high-rank regularization, a soft-OPML variant with adaptive label smoothing, and AP-based label correction, yielding strong robustness to noisy labels and strong performance on standard benchmarks, notably achieving state-of-the-art on CUB in SPMLL and competitive results on full-label MLL. The results suggest high-rankness of the label matrix can slow degradation under label noise, offering practical benefits for real-world annotation-scarce settings.
Abstract
In Multi-Label Learning (MLL), it is extremely challenging to accurately annotate every appearing object due to expensive costs and limited knowledge. When facing such a challenge, a more practical and cheaper alternative should be Single Positive Multi-Label Learning (SPMLL), where only one positive label needs to be provided per sample. Existing SPMLL methods usually assume unknown labels as negatives, which inevitably introduces false negatives as noisy labels. More seriously, Binary Cross Entropy (BCE) loss is often used for training, which is notoriously not robust to noisy labels. To mitigate this issue, we customize an objective function for SPMLL by pushing only one pair of labels apart each time to prevent the domination of negative labels, which is the main culprit of fitting noisy labels in SPMLL. To further combat such noisy labels, we explore the high-rankness of label matrix, which can also push apart different labels. By directly extending from SPMLL to MLL with full labels, a unified loss applicable to both settings is derived. Experiments on real datasets demonstrate that the proposed loss not only performs more robustly to noisy labels for SPMLL but also works well for full labels. Besides, we empirically discover that high-rankness can mitigate the dramatic performance drop in SPMLL. Most surprisingly, even without any regularization or fine-tuned label correction, only adopting our loss defeats state-of-the-art SPMLL methods on CUB, a dataset that severely lacks labels.
