Table of Contents
Fetching ...

Learnability with Partial Labels and Adaptive Nearest Neighbors

Nicolas A. Errandonea, Santiago Mazuelas, Jose A. Lozano, Sanjoy Dasgupta

Abstract

Prior work on partial labels learning (PLL) has shown that learning is possible even when each instance is associated with a bag of labels, rather than a single accurate but costly label. However, the necessary conditions for learning with partial labels remain unclear, and existing PLL methods are effective only in specific scenarios. In this work, we mathematically characterize the settings in which PLL is feasible. In addition, we present PL A-$k$NN, an adaptive nearest-neighbors algorithm for PLL that is effective in general scenarios and enjoys strong performance guarantees. Experimental results corroborate that PL A-$k$NN can outperform state-of-the-art methods in general PLL scenarios.

Learnability with Partial Labels and Adaptive Nearest Neighbors

Abstract

Prior work on partial labels learning (PLL) has shown that learning is possible even when each instance is associated with a bag of labels, rather than a single accurate but costly label. However, the necessary conditions for learning with partial labels remain unclear, and existing PLL methods are effective only in specific scenarios. In this work, we mathematically characterize the settings in which PLL is feasible. In addition, we present PL A-NN, an adaptive nearest-neighbors algorithm for PLL that is effective in general scenarios and enjoys strong performance guarantees. Experimental results corroborate that PL A-NN can outperform state-of-the-art methods in general PLL scenarios.
Paper Structure (41 sections, 15 theorems, 107 equations, 4 figures, 2 algorithms)

This paper contains 41 sections, 15 theorems, 107 equations, 4 figures, 2 algorithms.

Key Result

Theorem 2.2

A bag generation process $P(s | y, x)$ is reconstructible if and only if, the vectors $\mathbf{p}_{1,x}, \mathbf{p}_{2,x}, \dots, \mathbf{p}_{|\mathcal{Y}|,x}\in\mathbb{R}^{\mathcal{|S|}}$ in vect are linearly independent for any $x$.

Figures (4)

  • Figure 1: Comparison of the error rates of PL A-$k$NN and state-of-the-art methods for Fashion-MNIST and MSCRv2 under an increasing noise rate. The results show that PL A-$k$NN consistently outperforms existing approaches across a wide range of noise levels. See Appendix D for the comparison in MNIST CIFAR10 and MirFlickr.
  • Figure 2: Comparison of the error rates of PL A-$k$NN and state-of-the-art methods for Fashion-MNIST and MSRCv2 under an increasing noise rate. The results show that PL A-$k$NN outperforms $10$-NN and A-$k$NN across a wide range of noise levels, while having comparable performance to the best $k$NN. See Appendix D for the comparison in MNIST CIFAR10 and MirFlickr.
  • Figure 3: Comparison of the error rates of PL A-$k$NN and state-of-the-art methods for MNIST, CIFAR-10, and MirFlickr under an increasing noise rate. The results show that PL A-$k$NN outperforms existing approaches across a wide range of noise levels.
  • Figure 4: Comparison of the error rates of PL A-$k$NN and $k$NN benchmarks for MNIST, CIFAR-10, and MirFlickr under an increasing noise rate. The results show that PL A-$k$NN outperforms $10$-NN and A-$k$NN across a wide range of noise levels, while having comparable performance to the best $k$-NN.

Theorems & Definitions (19)

  • Definition 2.1
  • Theorem 2.2
  • Definition 2.3
  • Corollary 2.4
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Theorem 3.5
  • Theorem B.1
  • ...and 9 more