An Embedding is Worth a Thousand Noisy Labels

Francesco Di Salvo; Sebastian Doerrich; Ines Rieger; Christian Ledig

An Embedding is Worth a Thousand Noisy Labels

Francesco Di Salvo, Sebastian Doerrich, Ines Rieger, Christian Ledig

TL;DR

This work tackles the persistent problem of noisy labels by shifting to an embedding-space, training-free approach. It introduces WANN, a Weighted Adaptive Nearest Neighbor method that uses a reliability score $\eta$ to adapt the neighborhood size via $k_T = \frac{1}{\eta}$ with $k_{\min}=11$ and $k_{\max}=51$, weighting neighbors accordingly, and complements this with Filtered LDA (FLDA) for robust dimensionality reduction. The method leverages foundation-model embeddings (notably $DINOv2$ Large, 1024-d) to achieve strong robustness across diverse noise types and data regimes, often outperforming robust losses while enabling 10x–100x embedding-size reductions for efficiency. Its explainable neighborhood-based predictions and demonstrated cross-domain generalization—into medical data and long-tailed distributions—highlight its practical potential as a scalable, robust alternative to heavy neural network training in noisy-label settings.

Abstract

The performance of deep neural networks scales with dataset size and label quality, rendering the efficient mitigation of low-quality data annotations crucial for building robust and cost-effective systems. Existing strategies to address label noise exhibit severe limitations due to computational complexity and application dependency. In this work, we propose WANN, a Weighted Adaptive Nearest Neighbor approach that builds on self-supervised feature representations obtained from foundation models. To guide the weighted voting scheme, we introduce a reliability score $η$, which measures the likelihood of a data label being correct. WANN outperforms reference methods, including a linear layer trained with robust loss functions, on diverse datasets of varying size and under various noise types and severities. WANN also exhibits superior generalization on imbalanced data compared to both Adaptive-NNs (ANN) and fixed k-NNs. Furthermore, the proposed weighting scheme enhances supervised dimensionality reduction under noisy labels. This yields a significant boost in classification performance with 10x and 100x smaller image embeddings, minimizing latency and storage requirements. Our approach, emphasizing efficiency and explainability, emerges as a simple, robust solution to overcome inherent limitations of deep neural network training. The code is available at https://github.com/francescodisalvo05/wann-noisy-labels .

An Embedding is Worth a Thousand Noisy Labels

TL;DR

to adapt the neighborhood size via

with

and

, weighting neighbors accordingly, and complements this with Filtered LDA (FLDA) for robust dimensionality reduction. The method leverages foundation-model embeddings (notably

Large, 1024-d) to achieve strong robustness across diverse noise types and data regimes, often outperforming robust losses while enabling 10x–100x embedding-size reductions for efficiency. Its explainable neighborhood-based predictions and demonstrated cross-domain generalization—into medical data and long-tailed distributions—highlight its practical potential as a scalable, robust alternative to heavy neural network training in noisy-label settings.

Abstract

, which measures the likelihood of a data label being correct. WANN outperforms reference methods, including a linear layer trained with robust loss functions, on diverse datasets of varying size and under various noise types and severities. WANN also exhibits superior generalization on imbalanced data compared to both Adaptive-NNs (ANN) and fixed k-NNs. Furthermore, the proposed weighting scheme enhances supervised dimensionality reduction under noisy labels. This yields a significant boost in classification performance with 10x and 100x smaller image embeddings, minimizing latency and storage requirements. Our approach, emphasizing efficiency and explainability, emerges as a simple, robust solution to overcome inherent limitations of deep neural network training. The code is available at https://github.com/francescodisalvo05/wann-noisy-labels .

Paper Structure (17 sections, 4 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 17 sections, 4 equations, 7 figures, 7 tables, 1 algorithm.

Introduction
Related works
Method
Experiments and results
Backbone
Real-world noisy labels
Limited noisy data
Limited real-world noisy data
Generalization to medical data
Long-tailed noisy data
Dimensionality reduction
Explainability benefits
Discussion and conclusions
Generalizability across backbones
Robustness of reliability score
...and 2 more sections

Figures (7)

Figure 1: Illustration of the proposed Weighted Adaptive Nearest Neighbor (WANN) algorithm. Initially, a reliability score ($\eta$) is computed for each training observation, representing the inverse of the minimum number of samples needed for a correct prediction. During inference, the adaptive neighborhood size ($k_T$) of each test observation ($x_T$) is determined based on the reliability score of its closest training sample ($k_T = \frac{1}{\eta} = 3$). A weighted majority vote determines the final label, reducing the impact of noisy labels.
Figure 2: Example of test images and their relative top-$3$ closest training samples extracted from the STL-10 dataset Coates2011AnAO. Figure \ref{['fig:neigh-left']} shows two test images having wrong labels, while Figure \ref{['fig:neigh-right']} shows ambiguous labels, including multiple known objects within a single image. For instance, a bird with a ship in the background, or a dog fighting with a cat.
Figure 3: t-SNE projection of a CIFAR-10 subset and its noisy (30% asymmetric) counterpart.
Figure 4: Average accuracy ($\uparrow$) and 95% confidence interval on stratified subsets of Animal-10N.
Figure 5: Accuracy of fixed and adaptive $k$-NN based approaches on CIFAR-10LT and CIFAR-100LT, with $1\%$ and $10\%$ imbalance ratios. The imbalance ratio denotes the ratio between the least and the most frequent class, with an exponentially decaying imbalance between all other classes. Notably, WANN is close to the best performance across any noise pattern and severity. Furthermore, WANN is the significantly best method across all experiments and datasets (Wilcoxon signed-rank test, $p<0.05$).
...and 2 more figures

An Embedding is Worth a Thousand Noisy Labels

TL;DR

Abstract

An Embedding is Worth a Thousand Noisy Labels

Authors

TL;DR

Abstract

Table of Contents

Figures (7)