Table of Contents
Fetching ...

SuPLE: Robot Learning with Lyapunov Rewards

Phu Nguyen, Daniel Polani, Stas Tiomkin

TL;DR

This work explores an approach to utilize the Lyapunov exponents of the system dynamics to generate a system-immanent reward, and demonstrates that the Sum of the Positive Lyapunov Exponents (SuPLE) is a strong candidate for the design of such a reward.

Abstract

The reward function is an essential component in robot learning. Reward directly affects the sample and computational complexity of learning, and the quality of a solution. The design of informative rewards requires domain knowledge, which is not always available. We use the properties of the dynamics to produce system-appropriate reward without adding external assumptions. Specifically, we explore an approach to utilize the Lyapunov exponents of the system dynamics to generate a system-immanent reward. We demonstrate that the `Sum of the Positive Lyapunov Exponents' (SuPLE) is a strong candidate for the design of such a reward. We develop a computational framework for the derivation of this reward, and demonstrate its effectiveness on classical benchmarks for sample-based stabilization of various dynamical systems. It eliminates the need to start the training trajectories at arbitrary states, also known as auxiliary exploration. While the latter is a common practice in simulated robot learning, it is unpractical to consider to use it in real robotic systems, since they typically start from natural rest states such as a pendulum at the bottom, a robot on the ground, etc. and can not be easily initialized at arbitrary states. Comparing the performance of SuPLE to commonly-used reward functions, we observe that the latter fail to find a solution without auxiliary exploration, even for the task of swinging up the double pendulum and keeping it stable at the upright position, a prototypical scenario for multi-linked robots. SuPLE-induced rewards for robot learning offer a novel route for effective robot learning in typical as opposed to highly specialized or fine-tuned scenarios. Our code is publicly available for reproducibility and further research.

SuPLE: Robot Learning with Lyapunov Rewards

TL;DR

This work explores an approach to utilize the Lyapunov exponents of the system dynamics to generate a system-immanent reward, and demonstrates that the Sum of the Positive Lyapunov Exponents (SuPLE) is a strong candidate for the design of such a reward.

Abstract

The reward function is an essential component in robot learning. Reward directly affects the sample and computational complexity of learning, and the quality of a solution. The design of informative rewards requires domain knowledge, which is not always available. We use the properties of the dynamics to produce system-appropriate reward without adding external assumptions. Specifically, we explore an approach to utilize the Lyapunov exponents of the system dynamics to generate a system-immanent reward. We demonstrate that the `Sum of the Positive Lyapunov Exponents' (SuPLE) is a strong candidate for the design of such a reward. We develop a computational framework for the derivation of this reward, and demonstrate its effectiveness on classical benchmarks for sample-based stabilization of various dynamical systems. It eliminates the need to start the training trajectories at arbitrary states, also known as auxiliary exploration. While the latter is a common practice in simulated robot learning, it is unpractical to consider to use it in real robotic systems, since they typically start from natural rest states such as a pendulum at the bottom, a robot on the ground, etc. and can not be easily initialized at arbitrary states. Comparing the performance of SuPLE to commonly-used reward functions, we observe that the latter fail to find a solution without auxiliary exploration, even for the task of swinging up the double pendulum and keeping it stable at the upright position, a prototypical scenario for multi-linked robots. SuPLE-induced rewards for robot learning offer a novel route for effective robot learning in typical as opposed to highly specialized or fine-tuned scenarios. Our code is publicly available for reproducibility and further research.

Paper Structure

This paper contains 14 sections, 5 equations, 4 figures, 2 algorithms.

Figures (4)

  • Figure 2: Landscape of SuPLE in Single Pendulum. x-axis: angle, y-axis: angular velocity
  • Figure 3: Test error to the upright position for different rewards without random position resetting in training. In more complicated systems (left-to-right), Sparse and Error-Based (Quadratic) rewards fail, SuPLE succeeds in all systems.
  • Figure 4: Pair-wise representation of the 4D landscape. $\theta_1$, $\dot{\theta}_1$, $\theta_2$, and $\dot{\theta}_2$ are the angle of the first pole, its angular velocity, and analogously for the second pole. When both poles are aligned at the top position $\theta_1 = \frac{\pi}{2}$$rad$ and $\theta_2 = 0$$rad$.
  • Figure 5: Double pendulum test error with random position resetting in training.