Episodically adapted network-based controllers
Sruti Mallik, ShiNung Ching
TL;DR
This work addresses deploying control policies across a network of units to control unknown linear plants with robustness to unit failures. It introduces a model-free, networked controller synthesized in an augmented state space $\bm{\Omega}_t=[\bm{\Psi}_t,\bm{\nu}_t,\mathbf{x}_t]^T$ and develops an online, episodic approximate policy-iteration framework (LSAPI) that learns network dynamics without explicit plant models. The method relies on a quadratic cost and represents the state-action value as $Q_π(\Omega,u)=\Theta^T\phi_t$, enabling recursive least-squares estimation and periodic policy updates; convergence is analyzed under a two-timescale interpretation. Numerical experiments on a point-mass navigation task and an inverted pendulum on a cart demonstrate rapid learning of distributed policies, with notable robustness to substantial unit lesions. The approach offers a tractable path to robust, distributed control in uncertain environments, though extensions to nonlinear dynamics and more biologically plausible implementations remain open.
Abstract
We consider the problem of distributing a control policy across a network of interconnected units. Distributing controllers in this way has a number of potential advantages, especially in terms of robustness, as the failure of a single unit can be compensated by the activity of others. However, it is not obvious a priori how such network-based controllers should be constructed for any given system and control objective. Here, we propose a synthesis procedure for obtaining dynamical networks that enact well-defined control policies in a model-free manner. We specifically consider an augmented state space consisting of both the plant state and the network states. Solution of an optimization problem in this augmented state space produces a desired objective and specification of the network dynamics. Because of the analytical tractability of this method, we are able to provide convergence and robustness assessments
