Table of Contents
Fetching ...

Task-Driven Hybrid Model Reduction for Dexterous Manipulation

Wanxin Jin, Michael Posa

TL;DR

This article is inspired by the observation that far fewer modes are actually necessary to accomplish many tasks, and finds a reduced-order hybrid model requiring only a limited number of task-relevant modes.

Abstract

In contact-rich tasks, like dexterous manipulation, the hybrid nature of making and breaking contact creates challenges for model representation and control. For example, choosing and sequencing contact locations for in-hand manipulation, where there are thousands of potential hybrid modes, is not generally tractable. In this paper, we are inspired by the observation that far fewer modes are actually necessary to accomplish many tasks. Building on our prior work learning hybrid models, represented as linear complementarity systems, we find a reduced-order hybrid model requiring only a limited number of task-relevant modes. This simplified representation, in combination with model predictive control, enables real-time control yet is sufficient for achieving high performance. We demonstrate the proposed method first on synthetic hybrid systems, reducing the mode count by multiple orders of magnitude while achieving task performance loss of less than 5%. We also apply the proposed method to a three-fingered robotic hand manipulating a previously unknown object. With no prior knowledge, we achieve state-of-the-art closed-loop performance within a few minutes of online learning, by collecting only a few thousand environment samples.

Task-Driven Hybrid Model Reduction for Dexterous Manipulation

TL;DR

This article is inspired by the observation that far fewer modes are actually necessary to accomplish many tasks, and finds a reduced-order hybrid model requiring only a limited number of task-relevant modes.

Abstract

In contact-rich tasks, like dexterous manipulation, the hybrid nature of making and breaking contact creates challenges for model representation and control. For example, choosing and sequencing contact locations for in-hand manipulation, where there are thousands of potential hybrid modes, is not generally tractable. In this paper, we are inspired by the observation that far fewer modes are actually necessary to accomplish many tasks. Building on our prior work learning hybrid models, represented as linear complementarity systems, we find a reduced-order hybrid model requiring only a limited number of task-relevant modes. This simplified representation, in combination with model predictive control, enables real-time control yet is sufficient for achieving high performance. We demonstrate the proposed method first on synthetic hybrid systems, reducing the mode count by multiple orders of magnitude while achieving task performance loss of less than 5%. We also apply the proposed method to a three-fingered robotic hand manipulating a previously unknown object. With no prior knowledge, we achieve state-of-the-art closed-loop performance within a few minutes of online learning, by collecting only a few thousand environment samples.
Paper Structure (43 sections, 2 theorems, 48 equations, 15 figures, 8 tables)

This paper contains 43 sections, 2 theorems, 48 equations, 15 figures, 8 tables.

Key Result

Lemma 1

Suppose Assumption assumption.1 holds. For any reduced-order model $\boldsymbol{g}()$, the following inequality holds:

Figures (15)

  • Figure 1: Components of task-driven hybrid model reduction algorithm. There are three main components: learning reduced-order LCS, Trust-region LCS model predictive controller, and Rollout Buffer, each of which is detailed in the text.
  • Figure 2: Task-driven hybrid model reduction
  • Figure 3: Phase portraits of the MPC-controlled full-order dynamics $\boldsymbol{x}_{t+1}=\boldsymbol{f}(\boldsymbol{x}_t, \text{MPC}(\boldsymbol{x}_t))$, where the controller $\boldsymbol{u}_t=\text{MPC}(\boldsymbol{x}_t)$ can be either full-order $\boldsymbol{f}$-MPC or reduced-order $\boldsymbol{g}$-MPC. (a) is the phase portrait for the $\boldsymbol{f}$-MPC controller, where different colors indicate different hybrid modes (42 modes here) in $\boldsymbol{f}()$; (b) is the phase comparison between using $\boldsymbol{f}$-MPC and $\boldsymbol{g}$-MPC controllers at learning iteration 0; (c) is the phase portrait for $\boldsymbol{g}$-MPC controller at learning iteration 24, where different colors indicates different hybrid modes (4 modes here) in $\boldsymbol{g}()$; and (f) is the phase comparison between the $\boldsymbol{f}$-MPC controller and $\boldsymbol{g}$-MPC controller at learning iteration 24.
  • Figure 4: An example rollout of $\boldsymbol{f}()$ with full-order $\boldsymbol{f}$-MPC controller or reduced-order $\boldsymbol{g}$-MPC controller, corresponding to Case 7 in Table \ref{['table.pwa.ex2']}. Specifically, the left column is a single rollout of running $\boldsymbol{f}$-MPC controller on $\boldsymbol{f}()$, and the right running $\boldsymbol{g}$-MPC controller on $\boldsymbol{f}()$, both under the same initial condition. The upper row shows the activation of $\boldsymbol{\Lambda}$ or $\boldsymbol{\lambda}$ over time (black brick means nonzero and blank means zero). The bottom row shows the state trajectory $\boldsymbol{x}_{t+1}=\boldsymbol{f}(\boldsymbol{x}_t, \text{MPC}(\boldsymbol{x}_t))$ over time, with each color representing a different hybrid mode. Note that since each panel only shows a single instance of rollout (from a fixed $\boldsymbol{x}_0$), there are not many active hybrid modes of $\boldsymbol{f}$ involved in a single trajectory.
  • Figure 5: Ablation study about the effect of hyperparameter values on the algorithm performance. We here use the performance metrics On-policy ME in (\ref{['equ.pwa.mpe']}) and relative task performance gap $\mathcal{L}(\boldsymbol{g})(\%)$ in (\ref{['equ.pwa.loss']}) to report the algorithm performance. The experimenting system's dimensions and other settings follow Case 1 in Table \ref{['table.pwa.ex2']}. Each result is reported based on ten trials, and each trial uses a different randomly-generated full-order LCS $\boldsymbol{f}()$, as stated in the previous session. The error bars represent the standard deviation across all trials.
  • ...and 10 more figures

Theorems & Definitions (7)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • proof
  • Remark
  • proof