Table of Contents
Fetching ...

Efficient Knowledge Transfer for Jump-Starting Control Policy Learning of Multirotors through Physics-Aware Neural Architectures

Welf Rehberg, Mihir Kulkarni, Philipp Weiss, Kostas Alexis

TL;DR

This work focuses on accelerating policy training using a library-based initialization scheme that enables effective knowledge transfer across multirotor configurations by leveraging a physics-aware neural control architecture that combines a reinforcement learning-based controller and a supervised control allocation network.

Abstract

Efficiently training control policies for robots is a major challenge that can greatly benefit from utilizing knowledge gained from training similar systems through cross-embodiment knowledge transfer. In this work, we focus on accelerating policy training using a library-based initialization scheme that enables effective knowledge transfer across multirotor configurations. By leveraging a physics-aware neural control architecture that combines a reinforcement learning-based controller and a supervised control allocation network, we enable the reuse of previously trained policies. To this end, we utilize a policy evaluation-based similarity measure that identifies suitable policies for initialization from a library. We demonstrate that this measure correlates with the reduction in environment interactions needed to reach target performance and is therefore suited for initialization. Extensive simulation and real-world experiments confirm that our control architecture achieves state-of-the-art control performance, and that our initialization scheme saves on average up to $73.5\%$ of environment interactions (compared to training a policy from scratch) across diverse quadrotor and hexarotor designs, paving the way for efficient cross-embodiment transfer in reinforcement learning.

Efficient Knowledge Transfer for Jump-Starting Control Policy Learning of Multirotors through Physics-Aware Neural Architectures

TL;DR

This work focuses on accelerating policy training using a library-based initialization scheme that enables effective knowledge transfer across multirotor configurations by leveraging a physics-aware neural control architecture that combines a reinforcement learning-based controller and a supervised control allocation network.

Abstract

Efficiently training control policies for robots is a major challenge that can greatly benefit from utilizing knowledge gained from training similar systems through cross-embodiment knowledge transfer. In this work, we focus on accelerating policy training using a library-based initialization scheme that enables effective knowledge transfer across multirotor configurations. By leveraging a physics-aware neural control architecture that combines a reinforcement learning-based controller and a supervised control allocation network, we enable the reuse of previously trained policies. To this end, we utilize a policy evaluation-based similarity measure that identifies suitable policies for initialization from a library. We demonstrate that this measure correlates with the reduction in environment interactions needed to reach target performance and is therefore suited for initialization. Extensive simulation and real-world experiments confirm that our control architecture achieves state-of-the-art control performance, and that our initialization scheme saves on average up to of environment interactions (compared to training a policy from scratch) across diverse quadrotor and hexarotor designs, paving the way for efficient cross-embodiment transfer in reinforcement learning.
Paper Structure (15 sections, 11 equations, 10 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 11 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: The proposed library-based initialization scheme: For training a policy for a new configuration, we first train a configuration-specific control allocation network. Subsequently, a suitable policy for initializating the training of the control policy is picked from a library of policies using a reward-based similarity measure. The resulting training time is significantly lower than training a policy from scratch.
  • Figure 2: Description of arbitrary airframes considered in this work.
  • Figure 3: Proposed physics-aware neural control architecture. The architecture is split into a controller and an allocation network, which are trained separately. The control network is trained using RL, while the allocation network is trained via supervised learning.
  • Figure 4: Transformation of wrench commands.
  • Figure 5: Sampling range for sampling the configurations in the pool. With $l_{min}=0.1m$, $l_{max} = 0.35m$, $\gamma = 60\degree$ and $\phi=20\degree$.
  • ...and 5 more figures