Table of Contents
Fetching ...

Policy Optimization for PDE Control with a Warm Start

Xiangyuan Zhang, Saviz Mowlavi, Mouhacine Benosman, Tamer Başar

TL;DR

This work augments the reduce-then-design procedure with a policy optimization step, which fine-tunes the model-based controller to compensate for the modeling error from dimensionality reduction, and offers a cost-effective alternative to PDE control using end-to-end reinforcement learning.

Abstract

Dimensionality reduction is crucial for controlling nonlinear partial differential equations (PDE) through a "reduce-then-design" strategy, which identifies a reduced-order model and then implements model-based control solutions. However, inaccuracies in the reduced-order modeling can substantially degrade controller performance, especially in PDEs with chaotic behavior. To address this issue, we augment the reduce-then-design procedure with a policy optimization (PO) step. The PO step fine-tunes the model-based controller to compensate for the modeling error from dimensionality reduction. This augmentation shifts the overall strategy into reduce-then-design-then-adapt, where the model-based controller serves as a warm start for PO. Specifically, we study the state-feedback tracking control of PDEs that aims to align the PDE state with a specific constant target subject to a linear-quadratic cost. Through extensive experiments, we show that a few iterations of PO can significantly improve the model-based controller performance. Our approach offers a cost-effective alternative to PDE control using end-to-end reinforcement learning.

Policy Optimization for PDE Control with a Warm Start

TL;DR

This work augments the reduce-then-design procedure with a policy optimization step, which fine-tunes the model-based controller to compensate for the modeling error from dimensionality reduction, and offers a cost-effective alternative to PDE control using end-to-end reinforcement learning.

Abstract

Dimensionality reduction is crucial for controlling nonlinear partial differential equations (PDE) through a "reduce-then-design" strategy, which identifies a reduced-order model and then implements model-based control solutions. However, inaccuracies in the reduced-order modeling can substantially degrade controller performance, especially in PDEs with chaotic behavior. To address this issue, we augment the reduce-then-design procedure with a policy optimization (PO) step. The PO step fine-tunes the model-based controller to compensate for the modeling error from dimensionality reduction. This augmentation shifts the overall strategy into reduce-then-design-then-adapt, where the model-based controller serves as a warm start for PO. Specifically, we study the state-feedback tracking control of PDEs that aims to align the PDE state with a specific constant target subject to a linear-quadratic cost. Through extensive experiments, we show that a few iterations of PO can significantly improve the model-based controller performance. Our approach offers a cost-effective alternative to PDE control using end-to-end reinforcement learning.
Paper Structure (10 sections, 14 equations, 4 figures, 2 algorithms)

This paper contains 10 sections, 14 equations, 4 figures, 2 algorithms.

Figures (4)

  • Figure 1: Top: Training curves of model-free PO for (P1)-(P3) averaged over $6$ random seeds with shaded regions denoting standard deviation. We normalize the vertical axis with respect to the cost of the LQT controller based on the DMDc model. For (P1), the training process of PO (without a warm start) instantly destabilizes with $\eta=10^{-4}$. Hence, we select $\eta=10^{-5}$, the largest learning rate under which PO can consistently decrease the cost. Bottom: The tracking costs of (P1)-(P3) with no control, LQT, LQT-PO after 40 iterations, and pure PO after 40 iterations. The costs are averaged over 10 trajectories with randomly sampled initial fields $u(x, t=0)$. The shaded region denotes the standard deviation.
  • Figure 2: (P1): Comparing model-based, PO with warm start, and pure PO control strategies with $40$ iterations of training budget for the latter two.
  • Figure 3: (P2): Comparing model-based, PO with warm start, and pure PO control strategies with $40$ iterations of training budget for the latter two.
  • Figure 4: (P3): Comparing model-based, PO with warm start, and pure PO control strategies with $40$ iterations of training budget for the latter two.