Table of Contents
Fetching ...

Adaptive Event-triggered Reinforcement Learning Control for Complex Nonlinear Systems

Umer Siddique, Abhinav Sinha, Yongcan Cao

TL;DR

An adaptive event-triggered reinforcement learning control for continuous-time nonlinear systems, subject to bounded uncertainties, characterized by complex interactions is proposed, capable of jointly learning both the control policy and the communication policy.

Abstract

In this paper, we propose an adaptive event-triggered reinforcement learning control for continuous-time nonlinear systems, subject to bounded uncertainties, characterized by complex interactions. Specifically, the proposed method is capable of jointly learning both the control policy and the communication policy, thereby reducing the number of parameters and computational overhead when learning them separately or only one of them. By augmenting the state space with accrued rewards that represent the performance over the entire trajectory, we show that accurate and efficient determination of triggering conditions is possible without the need for explicit learning triggering conditions, thereby leading to an adaptive non-stationary policy. Finally, we provide several numerical examples to demonstrate the effectiveness of the proposed approach.

Adaptive Event-triggered Reinforcement Learning Control for Complex Nonlinear Systems

TL;DR

An adaptive event-triggered reinforcement learning control for continuous-time nonlinear systems, subject to bounded uncertainties, characterized by complex interactions is proposed, capable of jointly learning both the control policy and the communication policy.

Abstract

In this paper, we propose an adaptive event-triggered reinforcement learning control for continuous-time nonlinear systems, subject to bounded uncertainties, characterized by complex interactions. Specifically, the proposed method is capable of jointly learning both the control policy and the communication policy, thereby reducing the number of parameters and computational overhead when learning them separately or only one of them. By augmenting the state space with accrued rewards that represent the performance over the entire trajectory, we show that accurate and efficient determination of triggering conditions is possible without the need for explicit learning triggering conditions, thereby leading to an adaptive non-stationary policy. Finally, we provide several numerical examples to demonstrate the effectiveness of the proposed approach.
Paper Structure (12 sections, 12 equations, 5 figures)

This paper contains 12 sections, 12 equations, 5 figures.

Figures (5)

  • Figure 1: Performance comparison between PPO and ATPPO for a perturbed single integrator.
  • Figure 2: Performance comparison between PPO and ATPPO in the pursuit-evasion environment.
  • Figure 3: Performance comparison between PPO and ATPPO in the half-cheetah.
  • Figure 4: Performance comparison between PPO and ATPPO in Hopper.
  • Figure 5: Performance comparison between PPO and ATPPO in the Reacher.