Table of Contents
Fetching ...

DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control

Md Faizal Karim, Shreya Bollimuntha, Mohammed Saad Hashmi, Autrio Das, Gaurav Singh, Srinath Sridhar, Arun Kumar Singh, Nagamanikandan Govindan, K Madhava Krishna

TL;DR

A novel pipeline is proposed that combines the advantages of policy learning based on environment feedback and gradient-based optimization to learn controller gains required for the control outputs, which allows the robotic system to dynamically modulate its impedance in response to task demands, ensuring stability and dexterity in dual-arm operations.

Abstract

Dual-arm manipulation is an area of growing interest in the robotics community. Enabling robots to perform tasks that require the coordinated use of two arms, is essential for complex manipulation tasks such as handling large objects, assembling components, and performing human-like interactions. However, achieving effective dual-arm manipulation is challenging due to the need for precise coordination, dynamic adaptability, and the ability to manage interaction forces between the arms and the objects being manipulated. We propose a novel pipeline that combines the advantages of policy learning based on environment feedback and gradient-based optimization to learn controller gains required for the control outputs. This allows the robotic system to dynamically modulate its impedance in response to task demands, ensuring stability and dexterity in dual-arm operations. We evaluate our pipeline on a trajectory-tracking task involving a variety of large, complex objects with different masses and geometries. The performance is then compared to three other established methods for controlling dual-arm robots, demonstrating superior results.

DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control

TL;DR

A novel pipeline is proposed that combines the advantages of policy learning based on environment feedback and gradient-based optimization to learn controller gains required for the control outputs, which allows the robotic system to dynamically modulate its impedance in response to task demands, ensuring stability and dexterity in dual-arm operations.

Abstract

Dual-arm manipulation is an area of growing interest in the robotics community. Enabling robots to perform tasks that require the coordinated use of two arms, is essential for complex manipulation tasks such as handling large objects, assembling components, and performing human-like interactions. However, achieving effective dual-arm manipulation is challenging due to the need for precise coordination, dynamic adaptability, and the ability to manage interaction forces between the arms and the objects being manipulated. We propose a novel pipeline that combines the advantages of policy learning based on environment feedback and gradient-based optimization to learn controller gains required for the control outputs. This allows the robotic system to dynamically modulate its impedance in response to task demands, ensuring stability and dexterity in dual-arm operations. We evaluate our pipeline on a trajectory-tracking task involving a variety of large, complex objects with different masses and geometries. The performance is then compared to three other established methods for controlling dual-arm robots, demonstrating superior results.

Paper Structure

This paper contains 8 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Adaptive Dual-Arm Manipulation. We present a novel framework that integrates Reinforcement Learning with an optimization-based Variable Impedance Control for efficient and adaptive dual-arm manipulation. Our approach handles a diverse set of objects varying in shape and mass. We define a pick-and-place task into three stages: Grasp, where the arm approaches the object and grasps it based on the input grasp poses, Pick, where the arms pick the object to an intermediate waypoint and Place, where the arms position the object at its goal pose.
  • Figure 2: Overview of the Proposed Method: (a) The pipeline shows how the policy network uses observations $O_t$, reward $R_t$, time, and mass embeddings to predict stiffness $K$. This stiffness, along with state variables from MuJoCo and reference trajectory, is fed to the QP solver, which outputs joint accelerations $\boldsymbol{\ddot q^{\star}}_L$ and $\boldsymbol{\ddot q^{\star}}_R$. These accelerations are then converted to torques $\boldsymbol{\tau}_L$ and $\boldsymbol{\tau}_R$ and applied to the MuJoCo simulator. (b) Illustrates the QP solver implemented using CVXPY CVXPY, which computes impedance and postural errors using the provided stiffness and reference trajectory and solves the optimization problem (Equation \ref{['eq:QP_problem']}) with constraints (Equations \ref{['eq:constraint_1']}, \ref{['eq:constraint_2']}, \ref{['eq:constraint_3']}, \ref{['eq:constraint_4']}) to determine the joint accelerations. (c) Depicts the trajectory generation process, which uses a quintic polynomial for cartesian positions and SLERP for $SO(3)$ orientations. The trajectory includes a waypoint where the $x$ and $y$ coordinates are the averages of the initial and goal positions, and the $z$ coordinate is set to a random height within the workspace of the arms to introduce variability.
  • Figure 3: Qualitative comparison of our approach with baseline methods: Optimization-based Impedance Control (OIC), Impedance Control(IC), and RL-based Impedance Control (RL+IC). (a) Our framework successfully completes the pick-and-place task with the stool, while (b) Optimization-based Impedance Control (OIC) achieves similar results but exhibits object slipping (zoomed in the red circle) when the arms are fully extended. (c) Impedance Control (IC) fails to complete the task due to its inability to adapt impedance parameters dynamically, leading to poor object handling. (d) RL-based Impedance Control (RL + IC) completes the task without slipping but deviates from the reference trajectory mid-task.
  • Figure 4: Stiffness ($K$) values during pick-and-place of the chair (5kg). In Stage 1, $K$ values are low at motion initiation. Stage 2 shows an increase in $K$ to reach the intermediate waypoint. Stage 3 sees $K$ return to initial levels during object placement.
  • Figure 5: Torque values of different joints from our method for a pick and place task with three different masses (5kg, 2.5kg, 0.5kg). The torque values increase for higher masses, while smaller masses result in lower torque values (Joint notation indexing starts from Joint 1).
  • ...and 1 more figures