Table of Contents
Fetching ...

Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning

Hien Bui, Michael Posa

TL;DR

This work parameterizes the stochastic policy as the perturbed output of the MPC controller, thus, the learned model representation can directly associate with the policy or task performance, and significantly enhances task success rate.

Abstract

In contact-rich tasks, the hybrid, multi-modal nature of contact dynamics poses great challenges in model representation, planning, and control. Recent efforts have attempted to address these challenges via data-driven methods, learning dynamical models in combination with model predictive control. Those methods, while effective, rely solely on minimizing forward prediction errors to hope for better task performance with MPC controllers. This weak correlation can result in data inefficiency as well as limitations to overall performance. In response, we propose a novel strategy: using a policy gradient algorithm to find a simplified dynamics model that explicitly maximizes task performance. Specifically, we parameterize the stochastic policy as the perturbed output of the MPC controller, thus, the learned model representation can directly associate with the policy or task performance. We apply the proposed method to contact-rich tasks where a three-fingered robotic hand manipulates previously unknown objects. Our method significantly enhances task success rate by up to 15% in manipulating diverse objects compared to the existing method while sustaining data efficiency. Our method can solve some tasks with success rates of 70% or higher using under 30 minutes of data. All videos and codes are available at https://sites.google.com/view/lcs-rl.

Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning

TL;DR

This work parameterizes the stochastic policy as the perturbed output of the MPC controller, thus, the learned model representation can directly associate with the policy or task performance, and significantly enhances task success rate.

Abstract

In contact-rich tasks, the hybrid, multi-modal nature of contact dynamics poses great challenges in model representation, planning, and control. Recent efforts have attempted to address these challenges via data-driven methods, learning dynamical models in combination with model predictive control. Those methods, while effective, rely solely on minimizing forward prediction errors to hope for better task performance with MPC controllers. This weak correlation can result in data inefficiency as well as limitations to overall performance. In response, we propose a novel strategy: using a policy gradient algorithm to find a simplified dynamics model that explicitly maximizes task performance. Specifically, we parameterize the stochastic policy as the perturbed output of the MPC controller, thus, the learned model representation can directly associate with the policy or task performance. We apply the proposed method to contact-rich tasks where a three-fingered robotic hand manipulates previously unknown objects. Our method significantly enhances task success rate by up to 15% in manipulating diverse objects compared to the existing method while sustaining data efficiency. Our method can solve some tasks with success rates of 70% or higher using under 30 minutes of data. All videos and codes are available at https://sites.google.com/view/lcs-rl.
Paper Structure (25 sections, 14 equations, 4 figures, 1 table)

This paper contains 25 sections, 14 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: The diagram demonstrates our proposed framework of learning simplified dynamic models for solving contact-rich manipulation tasks in low data regimes. Our framework proposes an iterative learning loop that consists of main components: a stochastic policy and a policy optimizer. Top panel: Using learned dynamic models under MPC scheme and Gaussian noise to construct the stochastic policy. Bottom panel: Combining PPO and prediction loss to optimize the policy parameters with the collected on-policy data.
  • Figure 2: TriFinger dexterous manipulation tasks. (a) shows the simulation environment that is constructed using MuJoCo physics engine todorov2012mujoco. In this task, the three fingers need to push the cube towards a random target pose, visualized by the red transparent cube. (b) is an example of a rollout trajectory that demonstrates how the fingers approach, make, and break contacts to reposition the cube.
  • Figure 3: Learning curves of the TriFinger Moving Cube task. The red, blue, orange, and green lines show the average task success rate of our proposed method, the prior method Jin2024, a method that uses PPO without a warm-up phase, and PDDM Nagabandi.etal2020 respectively. At the beginning of the training, our method and the prior method Jin2024 share the same performance since the same algorithm is used. However, the transition occurs after collecting 6 minutes of data, when our method switches to fully employ the PPO algorithm. Shaded regions indicate normal t-score 95% confidence intervals.
  • Figure 4: Comparison task performance between learning the LCS model from scratch and pre-trained models (obtained from training with the TriFinger Moving Cube task) on the YCB objects using our LCS-RL framework.