Table of Contents
Fetching ...

Multi Task Inverse Reinforcement Learning for Common Sense Reward

Neta Glazer, Aviv Navon, Aviv Shamsian, Ethan Fetaya

TL;DR

Reward design in RL is prone to misalignment and reward hacking. This paper proposes disentangling the reward into a task-specific component and a shared common-sense component, and learns the latter via multi-task inverse reinforcement learning (MT-CSIRL) using a shared discriminator across tasks. The method demonstrates that standard IRL fails to produce a transferable cs-reward, while multi-task setups—especially MT-CSIRL and MT-CSIRL+LT—produce cs-rewards that transfer to unseen targets and tasks, with strong correlations to ground-truth signals in qualitative analyses. The work provides empirical evidence on Meta-World with synthetic cs-rewards and offers curriculum learning and extensions to unknown task rewards, highlighting practical benefits for safer, better-aligned RL systems.

Abstract

One of the challenges in applying reinforcement learning in a complex real-world environment lies in providing the agent with a sufficiently detailed reward function. Any misalignment between the reward and the desired behavior can result in unwanted outcomes. This may lead to issues like "reward hacking" where the agent maximizes rewards by unintended behavior. In this work, we propose to disentangle the reward into two distinct parts. A simple task-specific reward, outlining the particulars of the task at hand, and an unknown common-sense reward, indicating the expected behavior of the agent within the environment. We then explore how this common-sense reward can be learned from expert demonstrations. We first show that inverse reinforcement learning, even when it succeeds in training an agent, does not learn a useful reward function. That is, training a new agent with the learned reward does not impair the desired behaviors. We then demonstrate that this problem can be solved by training simultaneously on multiple tasks. That is, multi-task inverse reinforcement learning can be applied to learn a useful reward function.

Multi Task Inverse Reinforcement Learning for Common Sense Reward

TL;DR

Reward design in RL is prone to misalignment and reward hacking. This paper proposes disentangling the reward into a task-specific component and a shared common-sense component, and learns the latter via multi-task inverse reinforcement learning (MT-CSIRL) using a shared discriminator across tasks. The method demonstrates that standard IRL fails to produce a transferable cs-reward, while multi-task setups—especially MT-CSIRL and MT-CSIRL+LT—produce cs-rewards that transfer to unseen targets and tasks, with strong correlations to ground-truth signals in qualitative analyses. The work provides empirical evidence on Meta-World with synthetic cs-rewards and offers curriculum learning and extensions to unknown task rewards, highlighting practical benefits for safer, better-aligned RL systems.

Abstract

One of the challenges in applying reinforcement learning in a complex real-world environment lies in providing the agent with a sufficiently detailed reward function. Any misalignment between the reward and the desired behavior can result in unwanted outcomes. This may lead to issues like "reward hacking" where the agent maximizes rewards by unintended behavior. In this work, we propose to disentangle the reward into two distinct parts. A simple task-specific reward, outlining the particulars of the task at hand, and an unknown common-sense reward, indicating the expected behavior of the agent within the environment. We then explore how this common-sense reward can be learned from expert demonstrations. We first show that inverse reinforcement learning, even when it succeeds in training an agent, does not learn a useful reward function. That is, training a new agent with the learned reward does not impair the desired behaviors. We then demonstrate that this problem can be solved by training simultaneously on multiple tasks. That is, multi-task inverse reinforcement learning can be applied to learn a useful reward function.
Paper Structure (26 sections, 10 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 26 sections, 10 equations, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: MT-CSIRL architecture overview
  • Figure 2: Single task cs-reward: In (a), (b), we plot the rewards during the IRL process (IRL Agent), an RL agent with the learned cs-reward from the IRL process (Transferred), and a baseline RL agent without any common sense component (SAC). The red horizontal line represents the target value in velocity/action norm see Eq. \ref{['eq:gt_vel']} and \ref{['eq:gt_an']}. The scatter plot (c) shows the correlation between the ground-truth reward and the learned CS-Reward.
  • Figure 3: Ground Truth CS-Reward: In (a) and (b), We show the ground-truth common sense behavior on three different experiments. Expert: maximizes task reward and ground-truth cs-reward. MT-CSIRL: trained with the learned cs-reward and with the task reward. SAC: "vanilla" training, trained only with task reward. In (c) we visualized the scatter plot between Ground-Truth & Learned CS-Reward from MT-CSIRL method, on Velocity Experiment.
  • Figure 4: Single task cs-reward: Visualization of the rewards during the IRL process (IRL Agent), an RL agent with the learned cs-reward from the IRL process (Transferred), and a baseline RL agent without any common sense component (SAC). The red horizontal line represents the target value in the ground truth reward, see Eq. \ref{['eq:gt_vel']} and \ref{['eq:gt_an']}. This experiment was conducted on the button-press-topdown-wall setup task.
  • Figure 5: Single task cs-reward:Visualization of the rewards during the IRL process (IRL Agent), an RL agent with the learned cs-reward from the IRL process (Transferred), and a baseline RL agent without any common sense component (SAC). The red horizontal line represents the target value in the ground truth reward, see Eq. \ref{['eq:gt_vel']} and \ref{['eq:gt_an']}. This experiment was conducted on the coffee-button setup task.
  • ...and 4 more figures