Pushing One Pair of Labels Apart Each Time in Multi-Label Learning: From Single Positive to Full Labels

Xiang Li; Xinrui Wang; Songcan Chen

Pushing One Pair of Labels Apart Each Time in Multi-Label Learning: From Single Positive to Full Labels

Xiang Li, Xinrui Wang, Songcan Chen

TL;DR

This work addresses robust multi-label learning when only a single positive label is observed per sample (SPMLL) or full labels are unavailable. It introduces OPML, a unified loss that updates only one pair of labels at a time via a log-sum-exp-based objective, preventing domination by negative labels and extending naturally to full MLL. The approach is augmented with high-rank regularization, a soft-OPML variant with adaptive label smoothing, and AP-based label correction, yielding strong robustness to noisy labels and strong performance on standard benchmarks, notably achieving state-of-the-art on CUB in SPMLL and competitive results on full-label MLL. The results suggest high-rankness of the label matrix can slow degradation under label noise, offering practical benefits for real-world annotation-scarce settings.

Abstract

In Multi-Label Learning (MLL), it is extremely challenging to accurately annotate every appearing object due to expensive costs and limited knowledge. When facing such a challenge, a more practical and cheaper alternative should be Single Positive Multi-Label Learning (SPMLL), where only one positive label needs to be provided per sample. Existing SPMLL methods usually assume unknown labels as negatives, which inevitably introduces false negatives as noisy labels. More seriously, Binary Cross Entropy (BCE) loss is often used for training, which is notoriously not robust to noisy labels. To mitigate this issue, we customize an objective function for SPMLL by pushing only one pair of labels apart each time to prevent the domination of negative labels, which is the main culprit of fitting noisy labels in SPMLL. To further combat such noisy labels, we explore the high-rankness of label matrix, which can also push apart different labels. By directly extending from SPMLL to MLL with full labels, a unified loss applicable to both settings is derived. Experiments on real datasets demonstrate that the proposed loss not only performs more robustly to noisy labels for SPMLL but also works well for full labels. Besides, we empirically discover that high-rankness can mitigate the dramatic performance drop in SPMLL. Most surprisingly, even without any regularization or fine-tuned label correction, only adopting our loss defeats state-of-the-art SPMLL methods on CUB, a dataset that severely lacks labels.

Pushing One Pair of Labels Apart Each Time in Multi-Label Learning: From Single Positive to Full Labels

TL;DR

Abstract

Paper Structure (17 sections, 11 equations, 14 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 11 equations, 14 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Proposed approach
Problem statement
The proposed OPML loss
Gradient analysis
High-rank regularization
Soft variant and label correction
Experiments
Experiments settings
Experimental results of SPMLL
Ablation study
Hyper-parameters study
Grad-CAM visualization
Experimental results of full labels
...and 2 more sections

Figures (14)

Figure 1: The optimization procedure for BCE and our OPML loss in SPMLL. The solid and the dotted arrows represent the optimization at current and previous steps, respectively. Rhombuses and circles with solid and dotted lines are the original positions in the initial state and the intermediate positions at current step, respectively. Figure 1a shows that, BCE loss optimizes all pairs of labels at once, thus it can be easily dominated by negative labels in SPMLL. Figure 1b-1d describe three procedures of pushing one pair of labels apart each time. Note that, the selected negative label is the one with the maximum score, i.e., the nearest one to the single positive label at current step.
Figure 2: The framework of cooperating the high-rank regularization with our OPML loss, where $\mathbf{Y}_{pred}$ and $\mathbf{Y}_{obs}$ denote the predicted and observed label matrices, respectively.
Figure 3: The histogram of the predicted confidence corresponding to the positive labels in the test label matrix. The "OPML" with blue and "BCE" with orange denote the histogram of the predicted confidence trained with OPML and BCE loss, respectively. The width of each bin is $0.2$, note that in \ref{['fig33b']}, when trained with BCE loss, the frequencies of the predicted confidence that falling into the interval of $[0.6,0.8)$ and $[0.8,1]$ are zero.
Figure 4: The effectiveness of high-rank regularization. The performance drop is not that dramatic with such a regularization.
Figure 5: Hyper-parameters study of $\widetilde{\alpha}$ and $\widetilde{\beta}$. The best performance is marked with orange.
...and 9 more figures

Pushing One Pair of Labels Apart Each Time in Multi-Label Learning: From Single Positive to Full Labels

TL;DR

Abstract

Pushing One Pair of Labels Apart Each Time in Multi-Label Learning: From Single Positive to Full Labels

Authors

TL;DR

Abstract

Table of Contents

Figures (14)