Table of Contents
Fetching ...

Robotic Arm Manipulation with Inverse Reinforcement Learning & TD-MPC

Md Shoyib Hassan, Sabir Md Sanaullah

TL;DR

The paper tackles the difficulty of scaling model-based IRL to real robotic manipulation by learning cost functions from visual demonstrations and optimizing with a temporal-difference visual MPC framework that employs a keypoint-based latent state and a pre-trained dynamics model. It introduces a gradient-based IRL mechanism that differentiates through the inner optimization to update cost parameters, and supplements this with an adversarial IRL variant using TD-MPC. Key contributions include a compact, keypoint-based visual representation, a latent dynamics model, and a gradient-based bi-level optimization approach for IRL in vision-based manipulation, demonstrated on a simulated Franka Panda task. The work advances sample efficiency and generalization in visual IRL for robotics and highlights practical avenues for robust visual prediction, viewpoint invariance, and potential natural-language command integration to broaden applicability.

Abstract

One unresolved issue is how to scale model-based inverse reinforcement learning (IRL) to actual robotic manipulation tasks with unpredictable dynamics. The ability to learn from both visual and proprioceptive examples, creating algorithms that scale to high-dimensional state-spaces, and mastering strong dynamics models are the main obstacles. In this work, we provide a gradient-based inverse reinforcement learning framework that learns cost functions purely from visual human demonstrations. The shown behavior and the trajectory is then optimized using TD visual model predictive control(MPC) and the learned cost functions. We test our system using fundamental object manipulation tasks on hardware.

Robotic Arm Manipulation with Inverse Reinforcement Learning & TD-MPC

TL;DR

The paper tackles the difficulty of scaling model-based IRL to real robotic manipulation by learning cost functions from visual demonstrations and optimizing with a temporal-difference visual MPC framework that employs a keypoint-based latent state and a pre-trained dynamics model. It introduces a gradient-based IRL mechanism that differentiates through the inner optimization to update cost parameters, and supplements this with an adversarial IRL variant using TD-MPC. Key contributions include a compact, keypoint-based visual representation, a latent dynamics model, and a gradient-based bi-level optimization approach for IRL in vision-based manipulation, demonstrated on a simulated Franka Panda task. The work advances sample efficiency and generalization in visual IRL for robotics and highlights practical avenues for robust visual prediction, viewpoint invariance, and potential natural-language command integration to broaden applicability.

Abstract

One unresolved issue is how to scale model-based inverse reinforcement learning (IRL) to actual robotic manipulation tasks with unpredictable dynamics. The ability to learn from both visual and proprioceptive examples, creating algorithms that scale to high-dimensional state-spaces, and mastering strong dynamics models are the main obstacles. In this work, we provide a gradient-based inverse reinforcement learning framework that learns cost functions purely from visual human demonstrations. The shown behavior and the trajectory is then optimized using TD visual model predictive control(MPC) and the learned cost functions. We test our system using fundamental object manipulation tasks on hardware.
Paper Structure (26 sections, 9 equations, 2 figures, 1 algorithm)

This paper contains 26 sections, 9 equations, 2 figures, 1 algorithm.

Figures (2)

  • Figure 1: A basic overview of our keypoint-based visual model predictive control framework for AIRL. Actions are optimized via Cross Entropy on the cost function.
  • Figure 2: Figures (a) to (k) represent the progressive performance of the robot in an instance of testing, after training based on the proposed IRL method. Figure (l) represents the change of loss and reward during the training of the model with respect to the number of episodes.