Table of Contents
Fetching ...

Episodically adapted network-based controllers

Sruti Mallik, ShiNung Ching

TL;DR

This work addresses deploying control policies across a network of units to control unknown linear plants with robustness to unit failures. It introduces a model-free, networked controller synthesized in an augmented state space $\bm{\Omega}_t=[\bm{\Psi}_t,\bm{\nu}_t,\mathbf{x}_t]^T$ and develops an online, episodic approximate policy-iteration framework (LSAPI) that learns network dynamics without explicit plant models. The method relies on a quadratic cost and represents the state-action value as $Q_π(\Omega,u)=\Theta^T\phi_t$, enabling recursive least-squares estimation and periodic policy updates; convergence is analyzed under a two-timescale interpretation. Numerical experiments on a point-mass navigation task and an inverted pendulum on a cart demonstrate rapid learning of distributed policies, with notable robustness to substantial unit lesions. The approach offers a tractable path to robust, distributed control in uncertain environments, though extensions to nonlinear dynamics and more biologically plausible implementations remain open.

Abstract

We consider the problem of distributing a control policy across a network of interconnected units. Distributing controllers in this way has a number of potential advantages, especially in terms of robustness, as the failure of a single unit can be compensated by the activity of others. However, it is not obvious a priori how such network-based controllers should be constructed for any given system and control objective. Here, we propose a synthesis procedure for obtaining dynamical networks that enact well-defined control policies in a model-free manner. We specifically consider an augmented state space consisting of both the plant state and the network states. Solution of an optimization problem in this augmented state space produces a desired objective and specification of the network dynamics. Because of the analytical tractability of this method, we are able to provide convergence and robustness assessments

Episodically adapted network-based controllers

TL;DR

This work addresses deploying control policies across a network of units to control unknown linear plants with robustness to unit failures. It introduces a model-free, networked controller synthesized in an augmented state space and develops an online, episodic approximate policy-iteration framework (LSAPI) that learns network dynamics without explicit plant models. The method relies on a quadratic cost and represents the state-action value as , enabling recursive least-squares estimation and periodic policy updates; convergence is analyzed under a two-timescale interpretation. Numerical experiments on a point-mass navigation task and an inverted pendulum on a cart demonstrate rapid learning of distributed policies, with notable robustness to substantial unit lesions. The approach offers a tractable path to robust, distributed control in uncertain environments, though extensions to nonlinear dynamics and more biologically plausible implementations remain open.

Abstract

We consider the problem of distributing a control policy across a network of interconnected units. Distributing controllers in this way has a number of potential advantages, especially in terms of robustness, as the failure of a single unit can be compensated by the activity of others. However, it is not obvious a priori how such network-based controllers should be constructed for any given system and control objective. Here, we propose a synthesis procedure for obtaining dynamical networks that enact well-defined control policies in a model-free manner. We specifically consider an augmented state space consisting of both the plant state and the network states. Solution of an optimization problem in this augmented state space produces a desired objective and specification of the network dynamics. Because of the analytical tractability of this method, we are able to provide convergence and robustness assessments

Paper Structure

This paper contains 27 sections, 4 theorems, 42 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Lemma III.1

If $\pi_1$ and $\pi_2$ are two stabilizing policies then, there exists constants $\beta_{\phi}$, $\beta_{c}$ and $\beta_{\Phi}$ such that $||\phi_t^1 - \phi_t^2|| \leq \beta_{\phi} ||\pi_1 - \pi_2||$, $||\mathbf{c}_t^1 - \mathbf{c}_t^2|| \leq \beta_{c} ||\pi_1 - \pi_2||$ and $||\Phi_t^1 - \Phi_t^2||

Figures (10)

  • Figure 1: We consider the problem of constructing and parameterizing distributed, network-based controller for unknown systems. A base network architecture is analytically developed and then adapted over successive learning episodes.
  • Figure 2: A. Network activity during the first 100 timesteps of episodes (top) point-mass system (bottom) inverted pendulum on a cart. (Note that activity here is represented as changes wrt a positive baseline). B. Network connections evolving over episodes (left) point-mass system (right) inverted pendulum on a cart.
  • Figure 3: Schematic of tasks to be performed or systems to be controlled.
  • Figure 4: (Left) Feedback matrix for known model dynamics (Middle) Feedback matrix from model free policy iteration. (Right) Convergence behavior of the algorithm over iterations. A. Point mass system and B. Pendulum on a cart system. In the bottom left panel, we have shown the columns which correspond to the sub-matrices $\mathbf{W}_{\Psi}$, $\mathbf{W}_{\nu}$ and $\mathbf{W}_x$ of equation \ref{['inner_time_scale_2']}.
  • Figure 5: Point mass system A-B. The network quickly learns the optimal strategy to perform the task. (The running cost in the first 25 timesteps of each iteration is shown inset in A). C. Activity of the network after convergence to an optimal strategy.
  • ...and 5 more figures

Theorems & Definitions (8)

  • proof
  • Lemma III.1
  • proof
  • Lemma III.2
  • proof
  • Lemma III.3
  • proof
  • Theorem III.4