Table of Contents
Fetching ...

Transfer Learning for LQR Control

Taosha Guo, Fabio Pasqualetti

TL;DR

The paper tackles learning an LQR controller for a target system with unknown dynamics by exploiting impulse-response data from both the target and multiple source systems. It derives a closed-form, data-driven LQR gain from impulse responses and weight matrices, and introduces a transfer-learning framework that reconstructs the target impulse response using a learned mode set from sources. The key contributions are (i) a mode-based method to reduce target-data requirements from $2n$ to $n$ samples, (ii) an algorithm to identify a transferable mode dictionary $\Lambda$ and estimate the target modes $\Sigma$, and (iii) demonstration that incorporating source data can halve the sample complexity for controller synthesis. The approach enables data-efficient, transferable LQR design with practical implications for scenarios where target-system data are scarce but related-source data are plentiful.

Abstract

In this paper, we study a transfer learning framework for Linear Quadratic Regulator (LQR) control, where (i) the dynamics of the system of interest (target system) are unknown and only a short trajectory of impulse responses from the target system is provided, and (ii) impulse responses are available from $N$ source systems with different dynamics. We show that the LQR controller can be learned from a sufficiently long trajectory of impulse responses. Further, a transferable mode set can be identified using the available data from source systems and the target system, enabling the reconstruction of the target system's impulse responses for controller design. By leveraging data from source systems, we show that the sample complexity for synthesizing the LQR controller can be reduced by $50 \%$. Algorithms and numerical examples are provided to demonstrate the implementation of the proposed transfer control framework.

Transfer Learning for LQR Control

TL;DR

The paper tackles learning an LQR controller for a target system with unknown dynamics by exploiting impulse-response data from both the target and multiple source systems. It derives a closed-form, data-driven LQR gain from impulse responses and weight matrices, and introduces a transfer-learning framework that reconstructs the target impulse response using a learned mode set from sources. The key contributions are (i) a mode-based method to reduce target-data requirements from to samples, (ii) an algorithm to identify a transferable mode dictionary and estimate the target modes , and (iii) demonstration that incorporating source data can halve the sample complexity for controller synthesis. The approach enables data-efficient, transferable LQR design with practical implications for scenarios where target-system data are scarce but related-source data are plentiful.

Abstract

In this paper, we study a transfer learning framework for Linear Quadratic Regulator (LQR) control, where (i) the dynamics of the system of interest (target system) are unknown and only a short trajectory of impulse responses from the target system is provided, and (ii) impulse responses are available from source systems with different dynamics. We show that the LQR controller can be learned from a sufficiently long trajectory of impulse responses. Further, a transferable mode set can be identified using the available data from source systems and the target system, enabling the reconstruction of the target system's impulse responses for controller design. By leveraging data from source systems, we show that the sample complexity for synthesizing the LQR controller can be reduced by . Algorithms and numerical examples are provided to demonstrate the implementation of the proposed transfer control framework.

Paper Structure

This paper contains 15 sections, 5 theorems, 41 equations, 2 figures, 1 algorithm.

Key Result

Theorem 3.1

(Data-driven optimal controller) The controller gain ${K}_{\textup{LQR}}^{t}$ in eq: static controller has the following alternative expression: where $\bm{Q}_t = \emph{diag}(Q, \dots,Q)$ and $\bm{R}_t = \emph{diag}(R, \dots,R)$ respectively contain $T-t+1$ diagonal blocks; $\bm{M}_t$ and $E$ are constructed using the impulse responses $M(1:T)$ collected from the unknown system:

Figures (2)

  • Figure 1: This figure shows the error $\|{K}_{\textup{LQR}}^* - {K}_{\textup{LQR}}^0 \|$ as a function of the $T$. The error converges exponentially as the number of impulse responses increases, which implies that \ref{['eq: Output feedback LQR']} reconstruct exactly the output feedback LQR gain.
  • Figure 2: This figure shows the error $\|{K}_{\textup{LQR}}^* - {K}_{\textup{LQR}}^0 \|$ for unknown target systems with different values of $Z(\hat{\bm \alpha})$ . The error is small for the target systems associated with small values of $Z(\hat{\bm \alpha})$.

Theorems & Definitions (8)

  • Theorem 3.1
  • Remark 1
  • Lemma 3.2
  • Definition 1
  • Lemma 4.1
  • Lemma 4.2
  • proof
  • Lemma 4.3