Second-Order MPC-Based Distributed Q-Learning

Samuel Mallick; Filippo Airaldi; Azita Dabiri; Bart De Schutter

Second-Order MPC-Based Distributed Q-Learning

Samuel Mallick, Filippo Airaldi, Azita Dabiri, Bart De Schutter

TL;DR

This paper tackles accelerating distributed MPC-based Q-learning for multi-agent systems with privacy constraints by introducing a second-order update. It derives a distributed Hessian-informed update that decomposes globally into per-agent calculations using consensus on a structured gradient/Hessian term, solving $(m{H}+m{Lambda})m{d}=m{q}$ with $ m{q} = -rac{1}{T} extstyle\sum_{ au} oldsymbol{ delta} abla_ heta Q_ heta(s_ au,a_ au)$ and $ m{H} = rac{1}{T} extstyleig( abla_ heta Q_ heta abla_ heta Q_ heta^ op - abla_ heta^2 Q_ hetaig)$. Through consensus on a matrix $m{C}$, the update becomes $m{d}_i = - ilde{m{K}}_i m{G}_i (oldsymbol{ delta} - (m{I}+m{C})^{-1}m{C}oldsymbol{ delta})$, enabling fully distributed computation. Simulations on a three-agent network show the distributed second-order method matches centralized second-order performance and outperforms the first-order variant, with communication scaling as $O(T^2)$ and remaining independent of the network size $M$. These results indicate significantly faster and more stable learning for distributed MPC-based RL, while preserving locality and privacy. $J( heta)$, $oldsymbol{ delta}$, $g_t$, and the Hessian terms are all handled with $ $delimiters$ in the narrative to clarify the mathematical structure.

Abstract

The state of the art for model predictive control (MPC)-based distributed Q-learning is limited to first-order gradient updates of the MPC parameterization. In general, using secondorder information can significantly improve the speed of convergence for learning, allowing the use of higher learning rates without introducing instability. This work presents a second-order extension to MPC-based Q-learning with updates distributed across local agents, relying only on locally available information and neighbor-to-neighbor communication. In simulation the approach is demonstrated to significantly outperform first-order distributed Q-learning.

Second-Order MPC-Based Distributed Q-Learning

TL;DR

Abstract

Second-Order MPC-Based Distributed Q-Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)

Theorems & Definitions (3)