Table of Contents
Fetching ...

PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learning

Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar

TL;DR

PANRL, a framework that chooses when the robot should use classical motion planning and when it should learn a policy, improves sample efficiency and mitigates distribution shift, ensuring robust task execution by combining the strengths of RL and Imitation Learning (IL).

Abstract

Reinforcement Learning (RL) has shown remarkable progress in simulation environments, yet its application to real-world robotic tasks remains limited due to challenges in exploration and generalization. To address these issues, we introduce PLANRL, a framework that chooses when the robot should use classical motion planning and when it should learn a policy. To further improve the efficiency in exploration, we use imitation data to bootstrap the exploration. PLANRL dynamically switches between two modes of operation: reaching a waypoint using classical techniques when away from the objects and reinforcement learning for fine-grained manipulation control when about to interact with objects. PLANRL architecture is composed of ModeNet for mode classification, NavNet for waypoint prediction, and InteractNet for precise manipulation. By combining the strengths of RL and Imitation Learning (IL), PLANRL improves sample efficiency and mitigates distribution shift, ensuring robust task execution. We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods. In simulations, PLANRL surpasses baseline methods by 10-15\% in training success rates at 30k samples and by 30-40\% during evaluation phases. In real-world scenarios, it demonstrates a 30-40\% higher success rate on simpler tasks compared to baselines and uniquely succeeds in complex, two-stage manipulation tasks. Datasets and supplementary materials can be found on our {https://raaslab.org/projects/NAVINACT/}.

PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learning

TL;DR

PANRL, a framework that chooses when the robot should use classical motion planning and when it should learn a policy, improves sample efficiency and mitigates distribution shift, ensuring robust task execution by combining the strengths of RL and Imitation Learning (IL).

Abstract

Reinforcement Learning (RL) has shown remarkable progress in simulation environments, yet its application to real-world robotic tasks remains limited due to challenges in exploration and generalization. To address these issues, we introduce PLANRL, a framework that chooses when the robot should use classical motion planning and when it should learn a policy. To further improve the efficiency in exploration, we use imitation data to bootstrap the exploration. PLANRL dynamically switches between two modes of operation: reaching a waypoint using classical techniques when away from the objects and reinforcement learning for fine-grained manipulation control when about to interact with objects. PLANRL architecture is composed of ModeNet for mode classification, NavNet for waypoint prediction, and InteractNet for precise manipulation. By combining the strengths of RL and Imitation Learning (IL), PLANRL improves sample efficiency and mitigates distribution shift, ensuring robust task execution. We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods. In simulations, PLANRL surpasses baseline methods by 10-15\% in training success rates at 30k samples and by 30-40\% during evaluation phases. In real-world scenarios, it demonstrates a 30-40\% higher success rate on simpler tasks compared to baselines and uniquely succeeds in complex, two-stage manipulation tasks. Datasets and supplementary materials can be found on our {https://raaslab.org/projects/NAVINACT/}.
Paper Structure (23 sections, 10 figures, 1 algorithm)

This paper contains 23 sections, 10 figures, 1 algorithm.

Figures (10)

  • Figure 1: Architecture of PLANRL: During training, PLANRL learns to predict waypoints, low-level actions, and the operational mode at each time step. One network (InteractNet) predicts the low-level action $a_t$ and the other network (ModeNet) predicts mode $m_t$. A separate network (NavNet) predicts the high-level waypoint $w_t$. At test time, the system samples $m_t$ and either moves to a waypoint (when $m_t = 0$) using the predicted waypoint or follows a dense action (when $m_t = 1$). The architecture allows for dynamic switching between motion-planning and interaction modes, facilitating robust performance in complex tasks. An example of how motion planning and interaction modes are integrated during execution is shown on the right.
  • Figure 2: ModeNet architecture designed to classify modes (motion-planning vs. interact) based on visual inputs.
  • Figure 3: NavNet architecture for predicting waypoints to guide high-level motion-planning tasks.
  • Figure 4: Analysis of action selection from Behavior Cloning (BC) policy during NAVINACT training. The figures show the proportion of actions taken from the BC and RL policy across different tasks, providing insight into the system's reliance on BC during training.
  • Figure 5: Sequence of images illustrating ModeNet's predicted modes during trajectory execution.
  • ...and 5 more figures