Table of Contents
Fetching ...

Drainage: A Unifying Framework for Addressing Class Uncertainty

Yasser Taha, Grégoire Montavon, Nils Körber

TL;DR

Drainage introduces a drainage node appended to the output layer to explicitly allocate uncertainty, enabling robust handling of noisy and ambiguous labels. The drainage loss incentivizes reallocating probability to the drainage and target classes while suppressing non-targets, with proven monotonicity and convexity properties. Empirically, Drainage yields substantial gains under high noise on CIFAR-10/100 and competitive results on WebVision, Clothing-1M, and ILSVRC-12, while providing qualitative evidence of denoising and more stable decision boundaries. The approach also supports Open Set Recognition by supplying an explicit unknown score, illustrating its versatility for both robust classification and open-set tasks with end-to-end training.

Abstract

Modern deep learning faces significant challenges with noisy labels, class ambiguity, as well as the need to robustly reject out-of-distribution or corrupted samples. In this work, we propose a unified framework based on the concept of a "drainage node'' which we add at the output of the network. The node serves to reallocate probability mass toward uncertainty, while preserving desirable properties such as end-to-end training and differentiability. This mechanism provides a natural escape route for highly ambiguous, anomalous, or noisy samples, particularly relevant for instance-dependent and asymmetric label noise. In systematic experiments involving the addition of varying proportions of instance-dependent noise or asymmetric noise to CIFAR-10/100 labels, our drainage formulation achieves an accuracy increase of up to 9\% over existing approaches in the high-noise regime. Our results on real-world datasets, such as mini-WebVision, mini-ImageNet and Clothing-1M, match or surpass existing state-of-the-art methods. Qualitative analysis reveals a denoising effect, where the drainage neuron consistently absorbs corrupt, mislabeled, or outlier data, leading to more stable decision boundaries. Furthermore, our drainage formulation enables applications well beyond classification, with immediate benefits for web-scale, semi-supervised dataset cleaning, and open-set applications.

Drainage: A Unifying Framework for Addressing Class Uncertainty

TL;DR

Drainage introduces a drainage node appended to the output layer to explicitly allocate uncertainty, enabling robust handling of noisy and ambiguous labels. The drainage loss incentivizes reallocating probability to the drainage and target classes while suppressing non-targets, with proven monotonicity and convexity properties. Empirically, Drainage yields substantial gains under high noise on CIFAR-10/100 and competitive results on WebVision, Clothing-1M, and ILSVRC-12, while providing qualitative evidence of denoising and more stable decision boundaries. The approach also supports Open Set Recognition by supplying an explicit unknown score, illustrating its versatility for both robust classification and open-set tasks with end-to-end training.

Abstract

Modern deep learning faces significant challenges with noisy labels, class ambiguity, as well as the need to robustly reject out-of-distribution or corrupted samples. In this work, we propose a unified framework based on the concept of a "drainage node'' which we add at the output of the network. The node serves to reallocate probability mass toward uncertainty, while preserving desirable properties such as end-to-end training and differentiability. This mechanism provides a natural escape route for highly ambiguous, anomalous, or noisy samples, particularly relevant for instance-dependent and asymmetric label noise. In systematic experiments involving the addition of varying proportions of instance-dependent noise or asymmetric noise to CIFAR-10/100 labels, our drainage formulation achieves an accuracy increase of up to 9\% over existing approaches in the high-noise regime. Our results on real-world datasets, such as mini-WebVision, mini-ImageNet and Clothing-1M, match or surpass existing state-of-the-art methods. Qualitative analysis reveals a denoising effect, where the drainage neuron consistently absorbs corrupt, mislabeled, or outlier data, leading to more stable decision boundaries. Furthermore, our drainage formulation enables applications well beyond classification, with immediate benefits for web-scale, semi-supervised dataset cleaning, and open-set applications.

Paper Structure

This paper contains 22 sections, 3 theorems, 11 equations, 8 figures, 6 tables.

Key Result

Proposition 1

Any reallocation of probability from non-target to target reduces the drainage loss, specifically, $\ell(p_t+\delta,p_d,p_\mathcal{J}-\delta) \leq \ell(p_t,p_d,p_\mathcal{J})$ for any probability vector $(p_t,p_d,p_\mathcal{J})$ and perturbation $0 \leq \delta \leq p_\mathcal{J}$.

Figures (8)

  • Figure 1: Different sources of class uncertainty, and the way they are handled by a classical classification model (e.g. softmax / cross-entropy) and by our proposed drainage model. Our drainage-based approach is more robust to mislabelings, and allows ambiguous and outlier instances to be predicted as 'drainage' rather than classified arbitrarily.
  • Figure 1: Effect of changing $\alpha$ and $\beta$ in the MNIST toy example on the percentage of samples predicted as drainage per class. MNIST toy example depicted in Figure \ref{['fig:toy']}. In-distribution classes are in blue and Out-of-distribution in red.
  • Figure 2: A. Analysis of the drainage loss ($\alpha,\beta=1$) under different drainage levels $p_d$ and target allocation levels $p_t$, and exhibiting the monotonicity properties predicted in Propositions \ref{['proposition:pjpt']} and \ref{['proposition:pjpd']}. B. Application of our method on a two-dimensional toy example, where we observe the emergence of drainage dominated regions (in gray), and the effect drainage has in refining the decision boundaries between classes. C. Application of our method to the MNIST data. Here, we randomly relabel all training instances of classes 7, 8, 9 to labels 0 to 6. This causes instances from classes 7, 8, 9 to be systematically predicted as drainage on the test set.
  • Figure 2: Ablation study of the drainage loss on CIFAR-10 (left) and CIFAR-100 (right). We evaluate the effect of different regularization strengths, testing L1 and L2 coefficients of $1 \times 10^{-4}$, $2 \times 10^{-4}$, $1 \times 10^{-5}$ and $2 \times 10^{-5}$. We grid-search for the parameter $\alpha$ while setting $\beta=\alpha^{-1}$.
  • Figure 3: Examples from four Mini-WebVision mentornet classes, with validation samples ordered by increasing drainage response $p_d$.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Proposition 2
  • Proposition 3