Table of Contents
Fetching ...

Confidence HNC: A Network Flow Technique for Binary Classification with Noisy Labels

Dorit Hochbaum, Torpong Nitayanont

TL;DR

Confidence HNC (CHNC) extends Hochbaum's Normalized Cut to handle noisy labels by introducing confidence weights for labeled samples and solving a parametric minimum-cut problem. The method produces a two-stage approach: first computing confidence weights via nested parametric cuts, then solving CHNC with these weights to classify unlabeled samples and detect noisy labels. CHNC generalizes HNC, offers efficient computation through a single parametric min-cut run, and demonstrates superior accuracy, balanced accuracy, and noise-detection performance on synthetic and real datasets compared to DivideMix, Confident Learning, and Co-teaching+. The approach leverages graph-based semi-supervised learning, robust to label noise, and provides interpretable confidence scores for labeled data, with practical impact for applications requiring reliable learning from imperfect labels.

Abstract

We consider here a classification method that balances two objectives: large similarity within the samples in the cluster, and large dissimilarity between the cluster and its complement. The method, referred to as HNC or SNC, requires seed nodes, or labeled samples, at least one of which is in the cluster and at least one in the complement. Other than that, the method relies only on the relationship between the samples. The contribution here is the new method in the presence of noisy labels, based on HNC, called Confidence HNC, in which we introduce confidence weights that allow the given labels of labeled samples to be violated, with a penalty that reflects the perceived correctness of each given label. If a label is violated then it is interpreted that the label was noisy. The method involves a representation of the problem as a graph problem with hyperparameters that is solved very efficiently by the network flow technique of parametric cut. We compare the performance of the new method with leading algorithms on both real and synthetic data with noisy labels and demonstrate that it delivers improved performance in terms of classification accuracy as well as noise detection capability.

Confidence HNC: A Network Flow Technique for Binary Classification with Noisy Labels

TL;DR

Confidence HNC (CHNC) extends Hochbaum's Normalized Cut to handle noisy labels by introducing confidence weights for labeled samples and solving a parametric minimum-cut problem. The method produces a two-stage approach: first computing confidence weights via nested parametric cuts, then solving CHNC with these weights to classify unlabeled samples and detect noisy labels. CHNC generalizes HNC, offers efficient computation through a single parametric min-cut run, and demonstrates superior accuracy, balanced accuracy, and noise-detection performance on synthetic and real datasets compared to DivideMix, Confident Learning, and Co-teaching+. The approach leverages graph-based semi-supervised learning, robust to label noise, and provides interpretable confidence scores for labeled data, with practical impact for applications requiring reliable learning from imperfect labels.

Abstract

We consider here a classification method that balances two objectives: large similarity within the samples in the cluster, and large dissimilarity between the cluster and its complement. The method, referred to as HNC or SNC, requires seed nodes, or labeled samples, at least one of which is in the cluster and at least one in the complement. Other than that, the method relies only on the relationship between the samples. The contribution here is the new method in the presence of noisy labels, based on HNC, called Confidence HNC, in which we introduce confidence weights that allow the given labels of labeled samples to be violated, with a penalty that reflects the perceived correctness of each given label. If a label is violated then it is interpreted that the label was noisy. The method involves a representation of the problem as a graph problem with hyperparameters that is solved very efficiently by the network flow technique of parametric cut. We compare the performance of the new method with leading algorithms on both real and synthetic data with noisy labels and demonstrate that it delivers improved performance in terms of classification accuracy as well as noise detection capability.

Paper Structure

This paper contains 32 sections, 5 theorems, 13 equations, 9 figures, 11 tables.

Key Result

Lemma 3.1

gallo1989fasthochbaum1998pseudoflowhochbaum2008pseudoflow Given a parametric flow graph $G_{st}(\lambda)$ and a sequence of parameter values $\lambda _1 < \lambda _2\ldots < \lambda _q$, the corresponding minimum cut partitions, $(S_1, \overline{S}_1),$$(S_2, \overline{S}_2), \dots,$$(S_q, \overline

Figures (9)

  • Figure 1: Associated graph $G_{st}(\lambda)$ whose minimum cut provides a solution for (\ref{['eq:Classification-HNC']}) when $\lambda \geq 0$.
  • Figure 2: Associated graph $G^{'}_{st}(\lambda)$ whose minimum cut provides a solution for (\ref{['eq:Classification-HNC']}), generalized for both $\lambda > 0$ and $\lambda < 0$
  • Figure 3: Associated graph $G_{st}^c(\lambda)$ for CHNC (\ref{['eq:CHNC']})
  • Figure 4: Associated graphs for the confidence weights computation
  • Figure 5: Histograms of accuracy improvement on $2160$ synthetic datasets, with $20\%$ noise, yielded by CHNC over (a) DivideMix, (b) Confident Learning and (c) Co-teaching+. Area to the right of the dashed line indicates the datasets where CHNC outperforms. The red line indicates the mean of improvement.
  • ...and 4 more figures

Theorems & Definitions (9)

  • Lemma 3.1: Nested Cut Property
  • Lemma 4.1
  • proof
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • proof
  • Theorem 4.4
  • proof