Table of Contents
Fetching ...

Transfer Learning for a Class of Cascade Dynamical Systems

Shima Rabiei, Sandipan Mishra, Santiago Paternain

TL;DR

The paper tackles transferring a policy learned in a reduced-order RL model to a full cascade dynamical system by exploiting an inner-loop controller. It establishes ISS-based transfer guarantees that bound performance degradation in terms of inner-loop stability parameters $\alpha$, $\beta$, and reference variation, and a Lipschitz constant $L$ linking reduced-model transitions to the commanded state $X^*$. The authors validate the theory on a quadrotor navigation task, showing that increasing the inner-loop gain $K_p$ (reducing $\alpha$) reduces transfer loss and aligns the high-order and reduced-order dynamics. This work provides a principled framework for safe and efficient RL transfer in systems with nested control loops, with practical guidance on controller design to improve transfer fidelity.

Abstract

This work considers the problem of transfer learning in the context of reinforcement learning. Specifically, we consider training a policy in a reduced order system and deploying it in the full state system. The motivation for this training strategy is that running simulations in the full-state system may take excessive time if the dynamics are complex. While transfer learning alleviates the computational issue, the transfer guarantees depend on the discrepancy between the two systems. In this work, we consider a class of cascade dynamical systems, where the dynamics of a subset of the state-space influence the rest of the states but not vice-versa. The reinforcement learning policy learns in a model that ignores the dynamics of these states and treats them as commanded inputs. In the full-state system, these dynamics are handled using a classic controller (e.g., a PID). These systems have vast applications in the control literature and their structure allows us to provide transfer guarantees that depend on the stability of the inner loop controller. Numerical experiments on a quadrotor support the theoretical findings.

Transfer Learning for a Class of Cascade Dynamical Systems

TL;DR

The paper tackles transferring a policy learned in a reduced-order RL model to a full cascade dynamical system by exploiting an inner-loop controller. It establishes ISS-based transfer guarantees that bound performance degradation in terms of inner-loop stability parameters , , and reference variation, and a Lipschitz constant linking reduced-model transitions to the commanded state . The authors validate the theory on a quadrotor navigation task, showing that increasing the inner-loop gain (reducing ) reduces transfer loss and aligns the high-order and reduced-order dynamics. This work provides a principled framework for safe and efficient RL transfer in systems with nested control loops, with practical guidance on controller design to improve transfer fidelity.

Abstract

This work considers the problem of transfer learning in the context of reinforcement learning. Specifically, we consider training a policy in a reduced order system and deploying it in the full state system. The motivation for this training strategy is that running simulations in the full-state system may take excessive time if the dynamics are complex. While transfer learning alleviates the computational issue, the transfer guarantees depend on the discrepancy between the two systems. In this work, we consider a class of cascade dynamical systems, where the dynamics of a subset of the state-space influence the rest of the states but not vice-versa. The reinforcement learning policy learns in a model that ignores the dynamics of these states and treats them as commanded inputs. In the full-state system, these dynamics are handled using a classic controller (e.g., a PID). These systems have vast applications in the control literature and their structure allows us to provide transfer guarantees that depend on the stability of the inner loop controller. Numerical experiments on a quadrotor support the theoretical findings.

Paper Structure

This paper contains 13 sections, 45 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of the proposed approach. The RL agent is trained only on the reduced order model, where the inner loop is considered to have a unit transfer function. Thus, $X_t^\star$ and $A_t$ are the actions of the RL agent. The learned policy is then transferred to the full system that includes the dynamics of the state $X$.
  • Figure 2: Illustration of the relative average differences in expected discounted returns between the full-state system \ref{['eqn_high_order_controller_value']} and the reduced-order model \ref{['eqn_optimal_reduced']} for various values of $K_p$. These are averaged over 100 experiments with randomly initialized states $S$. In accordance with Theorem \ref{['theorem:2']}, we observe that increasing $K_p$ results in less performance degradation.
  • Figure 3: Mean and standard deviation of the distance to the target over 100 iterations for proportional gains $K_p$ set to 1, 2 and 60.
  • Figure 4: Mean and standard deviation of the orientation error computed over 100 iterations with random initial states $S$. The plot illustrates that as $K_p$ increases, the orientation error approaches zero. Thus reducing the total variation between the two transitions. This result aligns with Proposition \ref{['proposition:---_stable']}.
  • Figure 5: Comparison of the orientation trajectories of the full-state and reduced-order models for different values of $K_p$. Similar to Figure \ref{['fig:Error_theta']} we observe that the larger $K_p$ the more similar the trajectories between $\theta$ and the reference $\theta^\star$. Furthermore, we observe that the larger $K_p$ the less variations there are on $\theta^\star$. According to Theorem \ref{['theorem:2']} a large variation on the reference results in worse transfer which is consistent with this experiment.

Theorems & Definitions (3)

  • proof
  • proof
  • proof