Table of Contents
Fetching ...

DTC: Deep Tracking Control

Fabian Jenelten, Junzhe He, Farbod Farshidian, Marco Hutter

TL;DR

This work proposes a hybrid control architecture that combines the predictive capabilities and optimality guarantees of online planning with the inherent robustness attributed to offline learning, and demonstrates superior robustness in the presence of slippery or deformable ground when compared with model-based counterparts.

Abstract

Legged locomotion is a complex control problem that requires both accuracy and robustness to cope with real-world challenges. Legged systems have traditionally been controlled using trajectory optimization with inverse dynamics. Such hierarchical model-based methods are appealing due to intuitive cost function tuning, accurate planning, generalization, and most importantly, the insightful understanding gained from more than one decade of extensive research. However, model mismatch and violation of assumptions are common sources of faulty operation. Simulation-based reinforcement learning, on the other hand, results in locomotion policies with unprecedented robustness and recovery skills. Yet, all learning algorithms struggle with sparse rewards emerging from environments where valid footholds are rare, such as gaps or stepping stones. In this work, we propose a hybrid control architecture that combines the advantages of both worlds to simultaneously achieve greater robustness, foot-placement accuracy, and terrain generalization. Our approach utilizes a model-based planner to roll out a reference motion during training. A deep neural network policy is trained in simulation, aiming to track the optimized footholds. We evaluate the accuracy of our locomotion pipeline on sparse terrains, where pure data-driven methods are prone to fail. Furthermore, we demonstrate superior robustness in the presence of slippery or deformable ground when compared to model-based counterparts. Finally, we show that our proposed tracking controller generalizes across different trajectory optimization methods not seen during training. In conclusion, our work unites the predictive capabilities and optimality guarantees of online planning with the inherent robustness attributed to offline learning.

DTC: Deep Tracking Control

TL;DR

This work proposes a hybrid control architecture that combines the predictive capabilities and optimality guarantees of online planning with the inherent robustness attributed to offline learning, and demonstrates superior robustness in the presence of slippery or deformable ground when compared with model-based counterparts.

Abstract

Legged locomotion is a complex control problem that requires both accuracy and robustness to cope with real-world challenges. Legged systems have traditionally been controlled using trajectory optimization with inverse dynamics. Such hierarchical model-based methods are appealing due to intuitive cost function tuning, accurate planning, generalization, and most importantly, the insightful understanding gained from more than one decade of extensive research. However, model mismatch and violation of assumptions are common sources of faulty operation. Simulation-based reinforcement learning, on the other hand, results in locomotion policies with unprecedented robustness and recovery skills. Yet, all learning algorithms struggle with sparse rewards emerging from environments where valid footholds are rare, such as gaps or stepping stones. In this work, we propose a hybrid control architecture that combines the advantages of both worlds to simultaneously achieve greater robustness, foot-placement accuracy, and terrain generalization. Our approach utilizes a model-based planner to roll out a reference motion during training. A deep neural network policy is trained in simulation, aiming to track the optimized footholds. We evaluate the accuracy of our locomotion pipeline on sparse terrains, where pure data-driven methods are prone to fail. Furthermore, we demonstrate superior robustness in the presence of slippery or deformable ground when compared to model-based counterparts. Finally, we show that our proposed tracking controller generalizes across different trajectory optimization methods not seen during training. In conclusion, our work unites the predictive capabilities and optimality guarantees of online planning with the inherent robustness attributed to offline learning.
Paper Structure (35 sections, 3 equations, 8 figures, 3 tables)

This paper contains 35 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Robust and precise locomotion in various indoor and outdoor environments. The marriage of model-free and model-based control allows legged robots to be deployed in environments where steppable contact surfaces are sparse (bottom left) and environmental uncertainties are high (top right).
  • Figure 2: Evaluation of robustness.(A) ANYmal walks along a loose cover plate that eventually pitches forward (left to right, top to bottom). The third row shows ANYmal's perception of the surroundings during the transition and recovery phase. (B) The snapshots are taken at critical time instances when walking on slippery ground, just before complete recovery. (C) ANYmal climbs upstairs with disabled perception (top to bottom). The collision of the right-front end-effector with the stair tread triggers a swing reflex, visualized in orange.
  • Figure 3: Evaluation of tracking performance.(A) ANYmal climbs up a narrow table, turns, and descends back down to a box. The second image in the second row shows the robot's perception of the environment. (B) Euclidean norm of the planar foothold error, averaged over $\unit[20]{s}$ of operation using a constant heading velocity. The solid/dashed curves represent the average/maximum tracking errors. (C) Same representation as in (B), but the data was collected with baseline-to-2. (D)DTC deployed with baseline-to-2, enabling ANYMal to climb up a box of $\unit[0.48]{m}$.
  • Figure 4: Benchmarking against model-based control.(A)DTC successfully traverses an obstacle parkour (left to right) in simulation with a heading velocity of $\unit[1]{m/s}$. (B) Baseline-to-1 falls after stepping into a gap hidden from the perception (left to right). (C) ANYmal successfully overcomes a trapped floor using our hybrid control architecture (left to right).
  • Figure 5: Benchmarking against reinforcement learning.(A) Baseline-rl-1 attempts to cross a small gap. ANYmal initially manages to recover from miss-stepping with its front legs but subsequently gets stuck as its hind legs fall inside the gap. (B) Using baseline-rl-1, the robot stumbles along a narrow beam. (C) With DTC, the robot can pass four consecutive large gaps (left to right) without getting stuck or falling. (D) ANYmal is crossing a long beam using the proposed control framework.
  • ...and 3 more figures