Table of Contents
Fetching ...

Robust Agility via Learned Zero Dynamics Policies

Noel Csomay-Shanklin, William D. Compton, Ivan Dario Jimenez Rodriguez, Eric R. Ambrose, Yisong Yue, Aaron D. Ames

TL;DR

This approach, termed Zero Dynamics Policies, exploits the structure of underactuation by restricting the inputs of the target mapping to the subset of degrees of freedom that cannot be directly actuated, thereby achieving significant dimension reduction.

Abstract

We study the design of robust and agile controllers for hybrid underactuated systems. Our approach breaks down the task of creating a stabilizing controller into: 1) learning a mapping that is invariant under optimal control, and 2) driving the actuated coordinates to the output of that mapping. This approach, termed Zero Dynamics Policies, exploits the structure of underactuation by restricting the inputs of the target mapping to the subset of degrees of freedom that cannot be directly actuated, thereby achieving significant dimension reduction. Furthermore, we retain the stability and constraint satisfaction of optimal control while reducing the online computational overhead. We prove that controllers of this type stabilize hybrid underactuated systems and experimentally validate our approach on the 3D hopping platform, ARCHER. Over the course of 3000 hops the proposed framework demonstrates robust agility, maintaining stable hopping while rejecting disturbances on rough terrain.

Robust Agility via Learned Zero Dynamics Policies

TL;DR

This approach, termed Zero Dynamics Policies, exploits the structure of underactuation by restricting the inputs of the target mapping to the subset of degrees of freedom that cannot be directly actuated, thereby achieving significant dimension reduction.

Abstract

We study the design of robust and agile controllers for hybrid underactuated systems. Our approach breaks down the task of creating a stabilizing controller into: 1) learning a mapping that is invariant under optimal control, and 2) driving the actuated coordinates to the output of that mapping. This approach, termed Zero Dynamics Policies, exploits the structure of underactuation by restricting the inputs of the target mapping to the subset of degrees of freedom that cannot be directly actuated, thereby achieving significant dimension reduction. Furthermore, we retain the stability and constraint satisfaction of optimal control while reducing the online computational overhead. We prove that controllers of this type stabilize hybrid underactuated systems and experimentally validate our approach on the 3D hopping platform, ARCHER. Over the course of 3000 hops the proposed framework demonstrates robust agility, maintaining stable hopping while rejecting disturbances on rough terrain.
Paper Structure (19 sections, 2 theorems, 44 equations, 6 figures, 1 algorithm)

This paper contains 19 sections, 2 theorems, 44 equations, 6 figures, 1 algorithm.

Key Result

Theorem 1

Consider a controlled invariant manifold $\mathcal{M}_{ \mathbf{\psi} }$ whose zero dynamics are exponentially stable. Any control law exponentially stabilizing $\mathbf{\|} \mathbf{\eta} _k - \mathbf{\psi_{ \mathbf{\theta} }} ( \mathbf{z} _k)\|$ stabilizes the discrete-time composite system $( \mat

Figures (6)

  • Figure 1: Experiments run with Zero Dynamics Policies: a) treadmill hopping with disturbances up to 1 mile per hour, b) 1.5" stair climbing and 20° ramp descending, c) disturbance rejection, and d) hopping across a 2x4.
  • Figure 2: A depiction of the two necessary properties of $\mathcal{M}_{ \mathbf{\psi} }$: a) invariance under the discrete map $\mathbf{F}$, and b) stability.
  • Figure 3: a) The loss function exactly measures the extent to which the manifold is not invariant under optimal action b) a Monte Carlo approximation of the spatial loss is used, wherein the optimal policy is backpropogated through to update the surface.
  • Figure 4: A snapshot of the experiments conducted with ARCHER, including set point tracking, disturbance rejection, and hopping over rough terrain.
  • Figure 5: Left: A comparison between LQR (top) and ZDPs (bottom) while tracking a 2 m setpoint. Right: The output of the trained policy and the actual state at impact over 3000 hops, as compared to an LQR controller.
  • ...and 1 more figures

Theorems & Definitions (7)

  • proof
  • Remark 1
  • Theorem 1
  • proof
  • Remark 2
  • Theorem 2
  • proof