Table of Contents
Fetching ...

Maximising the Utility of Validation Sets for Imbalanced Noisy-label Meta-learning

Dung Anh Hoang, Cuong Nguyen, Belagiannis Vasileios, Thanh-Toan Do, Gustavo Carneiro

TL;DR

This work tackles meta-learning under imbalanced and mislabeled data by dropping reliance on manually curated validation sets. It introduces INOLML, an iterative method that constructs a pseudo-clean, class-balanced, and informative validation set by jointly maximizing informativeness and cleanliness while controlling for label noise. The approach integrates pseudo-clean detection, a bi-level optimization for validation-set selection, and dynamic pseudo-label refinement, achieving state-of-the-art results on both synthetic and real-world noisy-label benchmarks, including WebVision and Red mini-ImageNet. Overall, INOLML improves robustness and scalability of meta-learning in challenging label-noise scenarios, reducing dependency on costly clean validation data and offering practical gains for imbalanced-class settings.

Abstract

Meta-learning is an effective method to handle imbalanced and noisy-label learning, but it depends on a validation set containing randomly selected, manually labelled and balanced distributed samples. The random selection and manual labelling and balancing of this validation set is not only sub-optimal for meta-learning, but it also scales poorly with the number of classes. Hence, recent meta-learning papers have proposed ad-hoc heuristics to automatically build and label this validation set, but these heuristics are still sub-optimal for meta-learning. In this paper, we analyse the meta-learning algorithm and propose new criteria to characterise the utility of the validation set, based on: 1) the informativeness of the validation set; 2) the class distribution balance of the set; and 3) the correctness of the labels of the set. Furthermore, we propose a new imbalanced noisy-label meta-learning (INOLML) algorithm that automatically builds a validation set by maximising its utility using the criteria above. Our method shows significant improvements over previous meta-learning approaches and sets the new state-of-the-art on several benchmarks.

Maximising the Utility of Validation Sets for Imbalanced Noisy-label Meta-learning

TL;DR

This work tackles meta-learning under imbalanced and mislabeled data by dropping reliance on manually curated validation sets. It introduces INOLML, an iterative method that constructs a pseudo-clean, class-balanced, and informative validation set by jointly maximizing informativeness and cleanliness while controlling for label noise. The approach integrates pseudo-clean detection, a bi-level optimization for validation-set selection, and dynamic pseudo-label refinement, achieving state-of-the-art results on both synthetic and real-world noisy-label benchmarks, including WebVision and Red mini-ImageNet. Overall, INOLML improves robustness and scalability of meta-learning in challenging label-noise scenarios, reducing dependency on costly clean validation data and offering practical gains for imbalanced-class settings.

Abstract

Meta-learning is an effective method to handle imbalanced and noisy-label learning, but it depends on a validation set containing randomly selected, manually labelled and balanced distributed samples. The random selection and manual labelling and balancing of this validation set is not only sub-optimal for meta-learning, but it also scales poorly with the number of classes. Hence, recent meta-learning papers have proposed ad-hoc heuristics to automatically build and label this validation set, but these heuristics are still sub-optimal for meta-learning. In this paper, we analyse the meta-learning algorithm and propose new criteria to characterise the utility of the validation set, based on: 1) the informativeness of the validation set; 2) the class distribution balance of the set; and 3) the correctness of the labels of the set. Furthermore, we propose a new imbalanced noisy-label meta-learning (INOLML) algorithm that automatically builds a validation set by maximising its utility using the criteria above. Our method shows significant improvements over previous meta-learning approaches and sets the new state-of-the-art on several benchmarks.
Paper Structure (30 sections, 19 equations, 5 figures, 12 tables, 1 algorithm)

This paper contains 30 sections, 19 equations, 5 figures, 12 tables, 1 algorithm.

Figures (5)

  • Figure 1: Main stages of INOLML: 1) classify the noisy-label samples from $\mathcal{D}$ into $\mathcal{D}^{(c)}$ (samples that are likely to have clean labels) and $\mathcal{D}^{(n)}$ (samples likely to have noisy labels); 2) build a validation set $\mathcal{D}^{(v)}$ containing samples that are informative (from a meta-learning perspective), balanced and with a high likelihood of containing clean labels, and 3) train the meta-learning classifier with $\mathcal{D}^{(t)} = \mathcal{D}^{(c)}\setminus \mathcal{D}^{(v)}$ and $\mathcal{D}^{(v)}$.
  • Figure 2: Comparison between 2-dimensional t-SNE representations of the samples selected (samples for each class have different colours, and the selected validation samples per class are highlighted with a blue dot with black outline) by (\ref{['fig:naive_optimization']}) naive utility in \ref{['eq:naive_optimization']}, and (\ref{['fig:ours']}) our utility in \ref{['eq:informativeness']}. Note that this is the t-SNE representation for CIFAR10 dataset with a uniform noise rate of 40%.
  • Figure 3: Accuracy of the clean validation set $\mathcal{D}^{(v)}$ as training progresses evaluated on different noise benchmarks.
  • Figure 4: Weight distribution of samples from different data reweighting methods, under the setting CIFAR100 with 0.8 uniform noise ratio.
  • Figure 5: Accuracy (%) of our INOLML using different sample selection methods under uniform label noises.