ProPML: Probability Partial Multi-label Learning

Łukasz Struski; Adam Pardyl; Jacek Tabor; Bartosz Zieliński

ProPML: Probability Partial Multi-label Learning

Łukasz Struski, Adam Pardyl, Jacek Tabor, Bartosz Zieliński

TL;DR

ProPML is introduced, a novel probabilistic approach to this problem that extends the binary cross entropy to the PML setup and does not require suboptimal disambiguation and, as such, can be applied to any deep architecture.

Abstract

Partial Multi-label Learning (PML) is a type of weakly supervised learning where each training instance corresponds to a set of candidate labels, among which only some are true. In this paper, we introduce \our{}, a novel probabilistic approach to this problem that extends the binary cross entropy to the PML setup. In contrast to existing methods, it does not require suboptimal disambiguation and, as such, can be applied to any deep architecture. Furthermore, experiments conducted on artificial and real-world datasets indicate that \our{} outperforms existing approaches, especially for high noise in a candidate set.

ProPML: Probability Partial Multi-label Learning

TL;DR

Abstract

Paper Structure (12 sections, 1 equation, 6 figures, 5 tables)

This paper contains 12 sections, 1 equation, 6 figures, 5 tables.

Introduction
Related works
Probabilistic Partial Multi-label Learning
Experiments
Datasets.
Baseline methods.
Setups.
Results and discussion.
The real-world datasets.
Artificial datasets.
Vision datasets.
Conclusions

Figures (6)

Figure 1: In partial multiple-label learning, each training instance corresponds to a set of candidate labels. Only some of them are true (here, checkmarked), but we do not know which. This situation can appear, e.g., if many experts label the same image. Some of them give correct answers, while others can make mistakes.
Figure 2: The mean Average Precision (mAP) obtained for the top-5 methods on the VOC2007 dataset artificially corrupted to PML by flipping negative labels into positive with probability $q$ (noise ratio) equals $\{0.1, 0.2, 0.4\}$. ProPML obtains outperforms existing methods for higher noise ratio. Moreover, compared to the second-best complex CDCR based on curriculum learning, ProPML requires only loss function modification.
Figure 3: The first component of ProPML loss corresponds to the expected number of positively predicted S labels. If this expected value is small, then none of the labels from $S$ is predicted, and we put strong pressure on predicting at least one of them. However, when the expected is higher than $1$, at least one label from $S$ is probably predicted. Hence, we only delicately navigate toward predicting other S labels, which can be false. We should keep in mind that each $p_i \in [0, 1]$, therefore $\sum_{i \in S} p_i \in [0, |S|]$.
Figure 4: Results for four real-world datasets depending on the value of the $\lambda$ coefficient from the ProPML loss from \ref{['eq:promil']}. For average precision, the higher value, the better. For the remaining metrics, the lower value, the better. We observe a strong trend of decreasing Hamming loss for increasing $\lambda$. However, it does not generalize on other metrics, including the most important average precision, which behavior significantly differs between datasets. Nevertheless, most importantly, we observe the stability of the method for various hyperparameter values.
Figure 5: Critical difference diagrams comparing results on small real-world datasets shown in \ref{['tab.real_worlds']} (smaller is better). ProPML performs significantly better than all baseline methods except PML-NI.
...and 1 more figures

ProPML: Probability Partial Multi-label Learning

TL;DR

Abstract

ProPML: Probability Partial Multi-label Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)