Table of Contents
Fetching ...

SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning

Amogh Joshi, Adarsh Kumar Kosta, Kaushik Roy

TL;DR

SHIRE introduces a compute-efficient framework that encodes human intuition as Probabilistic Graphical Models (Intuition Nets) and integrates an Intuition Loss into PPO to accelerate Deep RL in robotic tasks. The approach yields 25-78% improvements in sample efficiency across a suite of environments, with only modest overheads, and enhances policy explainability by teaching agents to exhibit encoded elementary behaviors. Empirical results cover CartPole, MountainCar, Lunar Lander, Swimmer, and Taxi, including a real-world TurtleBot demonstration for the Taxi task. The method addresses both data efficiency and interpretability gaps in Deep RL, providing a practical pathway for rapid and transparent policy development in safety-critical robotics applications.

Abstract

The ability of neural networks to perform robotic perception and control tasks such as depth and optical flow estimation, simultaneous localization and mapping (SLAM), and automatic control has led to their widespread adoption in recent years. Deep Reinforcement Learning has been used extensively in these settings, as it does not have the unsustainable training costs associated with supervised learning. However, DeepRL suffers from poor sample efficiency, i.e., it requires a large number of environmental interactions to converge to an acceptable solution. Modern RL algorithms such as Deep Q Learning and Soft Actor-Critic attempt to remedy this shortcoming but can not provide the explainability required in applications such as autonomous robotics. Humans intuitively understand the long-time-horizon sequential tasks common in robotics. Properly using such intuition can make RL policies more explainable while enhancing their sample efficiency. In this work, we propose SHIRE, a novel framework for encoding human intuition using Probabilistic Graphical Models (PGMs) and using it in the Deep RL training pipeline to enhance sample efficiency. Our framework achieves 25-78% sample efficiency gains across the environments we evaluate at negligible overhead cost. Additionally, by teaching RL agents the encoded elementary behavior, SHIRE enhances policy explainability. A real-world demonstration further highlights the efficacy of policies trained using our framework.

SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning

TL;DR

SHIRE introduces a compute-efficient framework that encodes human intuition as Probabilistic Graphical Models (Intuition Nets) and integrates an Intuition Loss into PPO to accelerate Deep RL in robotic tasks. The approach yields 25-78% improvements in sample efficiency across a suite of environments, with only modest overheads, and enhances policy explainability by teaching agents to exhibit encoded elementary behaviors. Empirical results cover CartPole, MountainCar, Lunar Lander, Swimmer, and Taxi, including a real-world TurtleBot demonstration for the Taxi task. The method addresses both data efficiency and interpretability gaps in Deep RL, providing a practical pathway for rapid and transparent policy development in safety-critical robotics applications.

Abstract

The ability of neural networks to perform robotic perception and control tasks such as depth and optical flow estimation, simultaneous localization and mapping (SLAM), and automatic control has led to their widespread adoption in recent years. Deep Reinforcement Learning has been used extensively in these settings, as it does not have the unsustainable training costs associated with supervised learning. However, DeepRL suffers from poor sample efficiency, i.e., it requires a large number of environmental interactions to converge to an acceptable solution. Modern RL algorithms such as Deep Q Learning and Soft Actor-Critic attempt to remedy this shortcoming but can not provide the explainability required in applications such as autonomous robotics. Humans intuitively understand the long-time-horizon sequential tasks common in robotics. Properly using such intuition can make RL policies more explainable while enhancing their sample efficiency. In this work, we propose SHIRE, a novel framework for encoding human intuition using Probabilistic Graphical Models (PGMs) and using it in the Deep RL training pipeline to enhance sample efficiency. Our framework achieves 25-78% sample efficiency gains across the environments we evaluate at negligible overhead cost. Additionally, by teaching RL agents the encoded elementary behavior, SHIRE enhances policy explainability. A real-world demonstration further highlights the efficacy of policies trained using our framework.
Paper Structure (19 sections, 9 equations, 3 figures, 2 tables)

This paper contains 19 sections, 9 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Intuition Encoding for the Lunar Lander Environment
  • Figure 2: Integration of Intuition Loss with existing RL Policy Optimization Algorithms
  • Figure 3: Gymnasium environments used for evaluating the SHIRE framework. From left to right: CartPole, MountainCar, LunarLander, Swimmer, and Taxi.