CANDERE-COACH: Reinforcement Learning from Noisy Feedback

Yuxuan Li; Srijita Das; Matthew E. Taylor

CANDERE-COACH: Reinforcement Learning from Noisy Feedback

Yuxuan Li, Srijita Das, Matthew E. Taylor

TL;DR

The CANDERE-COACH algorithm is proposed, which is capable of learning from noisy feedback by a nonoptimal teacher, and a noise-filtering mechanism to de-noise online feedback data is proposed, thereby enabling the RL agent to successfully learn with up to 40% of the teacher feedback being incorrect.

Abstract

In recent times, Reinforcement learning (RL) has been widely applied to many challenging tasks. However, in order to perform well, it requires access to a good reward function which is often sparse or manually engineered with scope for error. Introducing human prior knowledge is often seen as a possible solution to the above-mentioned problem, such as imitation learning, learning from preference, and inverse reinforcement learning. Learning from feedback is another framework that enables an RL agent to learn from binary evaluative signals describing the teacher's (positive or negative) evaluation of the agent's action. However, these methods often make the assumption that evaluative teacher feedback is perfect, which is a restrictive assumption. In practice, such feedback can be noisy due to limited teacher expertise or other exacerbating factors like cognitive load, availability, distraction, etc. In this work, we propose the CANDERE-COACH algorithm, which is capable of learning from noisy feedback by a nonoptimal teacher. We propose a noise-filtering mechanism to de-noise online feedback data, thereby enabling the RL agent to successfully learn with up to 40% of the teacher feedback being incorrect. Experiments on three common domains demonstrate the effectiveness of the proposed approach.

CANDERE-COACH: Reinforcement Learning from Noisy Feedback

TL;DR

Abstract

Paper Structure (29 sections, 5 equations, 19 figures, 8 tables, 1 algorithm)

This paper contains 29 sections, 5 equations, 19 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Background
Classifier Augmented Noise Detecting and Relabelling COACH
Problem Statement
Methodology and Algorithm
Experimental evaluation and results
Results
Ablation studies
Extending to CANDERE-TAMER
Conclusion and Future Work
Deep COACH with noise
CANDERE-COACH with unlimited budget
More ablation study on online training
Ablation study on pretraining sizes
...and 14 more sections

Figures (19)

Figure 1: The overview of CANDERE-COACH . We use a classifier $C_\phi$ to filter noisy feedback and update policy $\pi_\theta$ and $C_\phi$ with filtered minibatches.
Figure 2: (a) Cart Pole, (a) Minigrid Door Key, and (c) Lunar Lander are used for evaluation
Figure 3: Performance of Deep COACH under different scales of noises in Cart Pole. While with an unlimited budget the Deep COACH is even able to learn against 40% noise slowly, the performance of Deep COACH significantly deteriorates with a limited budget.
Figure 4: Performance of CANDERE-COACH in Cart Pole, Door Key and Lunar Lander in 30% and 40% noise
Figure 5: Performance of CANDERE-COACH in Cart Pole, with noisy pretraining dataset
...and 14 more figures

CANDERE-COACH: Reinforcement Learning from Noisy Feedback

TL;DR

Abstract

CANDERE-COACH: Reinforcement Learning from Noisy Feedback

Authors

TL;DR

Abstract

Table of Contents

Figures (19)