Table of Contents
Fetching ...

Learning to Clean: Reinforcement Learning for Noisy Label Correction

Marzi Heidari, Hanping Zhang, Yuhong Guo

TL;DR

This work addresses learning with noisy labels by reframing label correction as a sequential decision problem. It introduces RLNLC, an actor-critic policy that leverages a deep embedding $f_ heta$ to compute neighborhood-based label predictions and decide which labels to correct. The reward combines a Label Consistency Reward and a Noisy Label Alignment Reward via $k$-nearest neighbors in embedding spaces, guiding effective corrections. Empirical results on CIFAR-10/100-IDN, Animal-10N, and Food-101N show consistent gains over state-of-the-art methods, including under substantial noise, highlighting RLNLC’s robustness and practical impact for data cleaning in real-world noisy-label settings.

Abstract

The challenge of learning with noisy labels is significant in machine learning, as it can severely degrade the performance of prediction models if not addressed properly. This paper introduces a novel framework that conceptualizes noisy label correction as a reinforcement learning (RL) problem. The proposed approach, Reinforcement Learning for Noisy Label Correction (RLNLC), defines a comprehensive state space representing data and their associated labels, an action space that indicates possible label corrections, and a reward mechanism that evaluates the efficacy of label corrections. RLNLC learns a deep feature representation based policy network to perform label correction through reinforcement learning, utilizing an actor-critic method. The learned policy is subsequently deployed to iteratively correct noisy training labels and facilitate the training of the prediction model. The effectiveness of RLNLC is demonstrated through extensive experiments on multiple benchmark datasets, where it consistently outperforms existing state-of-the-art techniques for learning with noisy labels.

Learning to Clean: Reinforcement Learning for Noisy Label Correction

TL;DR

This work addresses learning with noisy labels by reframing label correction as a sequential decision problem. It introduces RLNLC, an actor-critic policy that leverages a deep embedding to compute neighborhood-based label predictions and decide which labels to correct. The reward combines a Label Consistency Reward and a Noisy Label Alignment Reward via -nearest neighbors in embedding spaces, guiding effective corrections. Empirical results on CIFAR-10/100-IDN, Animal-10N, and Food-101N show consistent gains over state-of-the-art methods, including under substantial noise, highlighting RLNLC’s robustness and practical impact for data cleaning in real-world noisy-label settings.

Abstract

The challenge of learning with noisy labels is significant in machine learning, as it can severely degrade the performance of prediction models if not addressed properly. This paper introduces a novel framework that conceptualizes noisy label correction as a reinforcement learning (RL) problem. The proposed approach, Reinforcement Learning for Noisy Label Correction (RLNLC), defines a comprehensive state space representing data and their associated labels, an action space that indicates possible label corrections, and a reward mechanism that evaluates the efficacy of label corrections. RLNLC learns a deep feature representation based policy network to perform label correction through reinforcement learning, utilizing an actor-critic method. The learned policy is subsequently deployed to iteratively correct noisy training labels and facilitate the training of the prediction model. The effectiveness of RLNLC is demonstrated through extensive experiments on multiple benchmark datasets, where it consistently outperforms existing state-of-the-art techniques for learning with noisy labels.

Paper Structure

This paper contains 28 sections, 14 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of the proposed RLNLC. Each data point $\mathbf{x}_i$ is associated with an initial label $\widehat{\mathbf{y}}_i$ that is potentially noisy. The policy network $\pi_\theta$ is constructed over a deep feature extraction network $f_\theta$, and it determines actions based on the current state of the data $\boldsymbol{s}^t=\{(\mathbf{x}_i, \widehat{\mathbf{y}}^t_i)\}_{i=1}^N$, resulting in label corrections. The updated labels subsequently lead to the next state. The reward function is designed to evaluate the labels in an instance-dependent manner, capturing dataset-wide label consistency and the inter-subset alignment of the noisy labels with clean labels. The policy function is learned using an actor-critic method.
  • Figure 2: Label correction accuracy on the training set by deploying the trained policy function for $T^\prime$ time-steps. Results on CIFAR10-IDN and CIFAR100-IDN with various noise rates are plotted.
  • Figure 3: Sensitivity analysis for four hyper-parameters, $k$, $N_b$, $\lambda$, and $T$, on CIFAR100-IDN with 0.50 noise rate.