Table of Contents
Fetching ...

Learning from Noisy Labels via Conditional Distributionally Robust Optimization

Hui Guo, Grace Y. Yi, Boyu Wang

TL;DR

This work proposes a novel robust pseudo-labeling algorithm that leverages the likelihood ratio test to construct a pseudo-empirical distribution, providing a robust reference probability distribution in CDRO and derives a closed-form expression for the empirical robust risk and the optimal Lagrange multiplier of the dual problem, facilitating a principled balance between robustness and model fitting.

Abstract

While crowdsourcing has emerged as a practical solution for labeling large datasets, it presents a significant challenge in learning accurate models due to noisy labels from annotators with varying levels of expertise. Existing methods typically estimate the true label posterior, conditioned on the instance and noisy annotations, to infer true labels or adjust loss functions. These estimates, however, often overlook potential misspecification in the true label posterior, which can degrade model performances, especially in high-noise scenarios. To address this issue, we investigate learning from noisy annotations with an estimated true label posterior through the framework of conditional distributionally robust optimization (CDRO). We propose formulating the problem as minimizing the worst-case risk within a distance-based ambiguity set centered around a reference distribution. By examining the strong duality of the formulation, we derive upper bounds for the worst-case risk and develop an analytical solution for the dual robust risk for each data point. This leads to a novel robust pseudo-labeling algorithm that leverages the likelihood ratio test to construct a pseudo-empirical distribution, providing a robust reference probability distribution in CDRO. Moreover, to devise an efficient algorithm for CDRO, we derive a closed-form expression for the empirical robust risk and the optimal Lagrange multiplier of the dual problem, facilitating a principled balance between robustness and model fitting. Our experimental results on both synthetic and real-world datasets demonstrate the superiority of our method.

Learning from Noisy Labels via Conditional Distributionally Robust Optimization

TL;DR

This work proposes a novel robust pseudo-labeling algorithm that leverages the likelihood ratio test to construct a pseudo-empirical distribution, providing a robust reference probability distribution in CDRO and derives a closed-form expression for the empirical robust risk and the optimal Lagrange multiplier of the dual problem, facilitating a principled balance between robustness and model fitting.

Abstract

While crowdsourcing has emerged as a practical solution for labeling large datasets, it presents a significant challenge in learning accurate models due to noisy labels from annotators with varying levels of expertise. Existing methods typically estimate the true label posterior, conditioned on the instance and noisy annotations, to infer true labels or adjust loss functions. These estimates, however, often overlook potential misspecification in the true label posterior, which can degrade model performances, especially in high-noise scenarios. To address this issue, we investigate learning from noisy annotations with an estimated true label posterior through the framework of conditional distributionally robust optimization (CDRO). We propose formulating the problem as minimizing the worst-case risk within a distance-based ambiguity set centered around a reference distribution. By examining the strong duality of the formulation, we derive upper bounds for the worst-case risk and develop an analytical solution for the dual robust risk for each data point. This leads to a novel robust pseudo-labeling algorithm that leverages the likelihood ratio test to construct a pseudo-empirical distribution, providing a robust reference probability distribution in CDRO. Moreover, to devise an efficient algorithm for CDRO, we derive a closed-form expression for the empirical robust risk and the optimal Lagrange multiplier of the dual problem, facilitating a principled balance between robustness and model fitting. Our experimental results on both synthetic and real-world datasets demonstrate the superiority of our method.

Paper Structure

This paper contains 40 sections, 12 theorems, 100 equations, 9 figures, 7 tables, 1 algorithm.

Key Result

Proposition 2.1

Assume that for every given $\mathbf{x}\in\mathcal{X}$, $\widetilde{\mathbf{y}}\in\mathcal{Y}^R$ and $\psi\in\Psi$, $\ell(\psi(\mathbf{x}),\cdot)\in L^{1}(P_{\mathrm{y}|\mathbf{x},\widetilde{\mathbf{y}}})$, where $L^{1}(\cdot)$ is defined in Section sec.intro. Consider $\mathscr{d}(\cdot,\cdot)$ in

Figures (9)

  • Figure 1: Average test accuracy on the CIFAR-10 dataset with varying numbers of annotators. The shaded areas are constructed using the associated standard deviations.
  • Figure 2: Average accuracy on the CIFAR-10 and CIFAR-100 datasets ($R=5$) for different $\epsilon$ values.
  • Figure 3: Average test accuracy on the CIFAR-100 dataset with varying numbers of annotators. The error bars representing standard deviations are shaded.
  • Figure 4: Average accuracy of robust pseudo-labels on the CIFAR-10 and CIFAR-100 datasets ($R=5$) during the training process.
  • Figure 5: Average accuracy of robust pseudo-labels on the CIFAR-10 dataset with varying number of annotators in the training process.
  • ...and 4 more figures

Theorems & Definitions (29)

  • Remark 2.1
  • Definition 2.1: $p$-Wasserstein distance, blanchet2019quantifying
  • Proposition 2.1: dual problem
  • Remark 2.2
  • Remark 2.3
  • Theorem 2.2
  • Corollary 2.3: Empirical Robust Minimizer
  • Theorem 3.1: Optimal Action for Single Data Point: Binary Case
  • Remark 3.1
  • Remark 3.2
  • ...and 19 more