Table of Contents
Fetching ...

AVO: Amortized Value Optimization for Contact Mode Switching in Multi-Finger Manipulation

Adam Hung, Fan Yang, Abhinav Kumar, Sergio Aguilera Marinovic, Soshi Iba, Rana Soltani Zarrin, Dmitry Berenson

TL;DR

Amortized Value Optimization (AVO), which introduces a learned value function that predicts the total future task performance into the cost of the trajectory optimization at each planning step, and guides the optimizer toward states that minimize the cost in future sub-tasks.

Abstract

Dexterous manipulation tasks often require switching between different contact modes, such as rolling, sliding, sticking, or non-contact contact modes. When formulating dexterous manipulation tasks as a trajectory optimization problem, a common approach is to decompose these tasks into sub-tasks for each contact mode, which are each solved independently. Optimizing each sub-task independently can limit performance, as optimizing contact points, contact forces, or other variables without information about future sub-tasks can place the system in a state from which it is challenging to make progress on subsequent sub-tasks. Further, optimizing these sub-tasks is very computationally expensive. To address these challenges, we propose Amortized Value Optimization (AVO), which introduces a learned value function that predicts the total future task performance. By incorporating this value function into the cost of the trajectory optimization at each planning step, the value function gradients guide the optimizer toward states that minimize the cost in future sub-tasks. This effectively bridges separately optimized sub-tasks, and accelerates the optimization by reducing the amount of online computation needed. We validate AVO on a screwdriver grasping and turning task in both simulation and real world experiments, and show improved performance even with 50% less computational budget compared to trajectory optimization without the value function.

AVO: Amortized Value Optimization for Contact Mode Switching in Multi-Finger Manipulation

TL;DR

Amortized Value Optimization (AVO), which introduces a learned value function that predicts the total future task performance into the cost of the trajectory optimization at each planning step, and guides the optimizer toward states that minimize the cost in future sub-tasks.

Abstract

Dexterous manipulation tasks often require switching between different contact modes, such as rolling, sliding, sticking, or non-contact contact modes. When formulating dexterous manipulation tasks as a trajectory optimization problem, a common approach is to decompose these tasks into sub-tasks for each contact mode, which are each solved independently. Optimizing each sub-task independently can limit performance, as optimizing contact points, contact forces, or other variables without information about future sub-tasks can place the system in a state from which it is challenging to make progress on subsequent sub-tasks. Further, optimizing these sub-tasks is very computationally expensive. To address these challenges, we propose Amortized Value Optimization (AVO), which introduces a learned value function that predicts the total future task performance. By incorporating this value function into the cost of the trajectory optimization at each planning step, the value function gradients guide the optimizer toward states that minimize the cost in future sub-tasks. This effectively bridges separately optimized sub-tasks, and accelerates the optimization by reducing the amount of online computation needed. We validate AVO on a screwdriver grasping and turning task in both simulation and real world experiments, and show improved performance even with 50% less computational budget compared to trajectory optimization without the value function.

Paper Structure

This paper contains 18 sections, 5 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: AVO overview. We train a value function ensemble to predict the mean and variance of the long-horizon future cost of the task. We then use this value function to guide trajectory optimization for a delicate dexterous manipulation task with multiple contact modes. This both improves the quality of the trajectories and also amortizes the optimization.
  • Figure 2: Training and deployment processes for AVO. Trajectories $\tau$ for each sub-task with contact mode $c$ are saved, along with the shared final cost. For each contact mode, an ensemble of $M$ value functions are then trained, and at deployment time, the mean $\mu$ and variance $\sigma$ of the ensemble contribute to the optimization cost function.
  • Figure 3: Simulation and hardware environment setups for our screwdriver turning task. In simulation, we have full access to the state variables. In hardware, the robot hand provides proprioceptive data, and we use observe the screwdriver orientation with motion capture cameras.
  • Figure 4: Boxplots comparing quaternion angle differences between the final state and the goal state for each method during the High Budget (a) and Low Budget (b) simulation experiments. The median values for each method are labeled across 50 trials.
  • Figure 5: Boxplot comparing quaternion angle differences between the final state and the goal state for each method during hardware evaluations. The median values for each method are labeled across 10 trials.