Robotic Arm Manipulation with Inverse Reinforcement Learning & TD-MPC
Md Shoyib Hassan, Sabir Md Sanaullah
TL;DR
The paper tackles the difficulty of scaling model-based IRL to real robotic manipulation by learning cost functions from visual demonstrations and optimizing with a temporal-difference visual MPC framework that employs a keypoint-based latent state and a pre-trained dynamics model. It introduces a gradient-based IRL mechanism that differentiates through the inner optimization to update cost parameters, and supplements this with an adversarial IRL variant using TD-MPC. Key contributions include a compact, keypoint-based visual representation, a latent dynamics model, and a gradient-based bi-level optimization approach for IRL in vision-based manipulation, demonstrated on a simulated Franka Panda task. The work advances sample efficiency and generalization in visual IRL for robotics and highlights practical avenues for robust visual prediction, viewpoint invariance, and potential natural-language command integration to broaden applicability.
Abstract
One unresolved issue is how to scale model-based inverse reinforcement learning (IRL) to actual robotic manipulation tasks with unpredictable dynamics. The ability to learn from both visual and proprioceptive examples, creating algorithms that scale to high-dimensional state-spaces, and mastering strong dynamics models are the main obstacles. In this work, we provide a gradient-based inverse reinforcement learning framework that learns cost functions purely from visual human demonstrations. The shown behavior and the trajectory is then optimized using TD visual model predictive control(MPC) and the learned cost functions. We test our system using fundamental object manipulation tasks on hardware.
