Table of Contents
Fetching ...

Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

Gangshan Jing, He Bai, Jemin George, Aranya Chakrabortty, Piyush K. Sharma

TL;DR

The paper tackles the challenge of high-variance gradient estimation in large-scale distributed reinforcement learning by introducing an asynchronous, zeroth-order, block-coordinate approach that leverages the network structure of MAS. Each agent estimates its gradient from local costs without global consensus, enabling cluster-based parallel updates and accelerated convergence. The method is applied to model-free distributed LQR, with carefully designed local costs and a learning graph to ensure compatibility with global objectives, along with convergence and variance analyses. Simulation results on multi-robot formation formation and scalability tests demonstrate faster convergence and reduced gradient variance compared with centralized ZOO, highlighting practical viability for large networks and privacy preservation.

Abstract

Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and is designed for stochastic non-convex optimization with a possibly non-convex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design, where a learning graph is designed to describe the required interaction relationship among agents in distributed learning. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.

Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

TL;DR

The paper tackles the challenge of high-variance gradient estimation in large-scale distributed reinforcement learning by introducing an asynchronous, zeroth-order, block-coordinate approach that leverages the network structure of MAS. Each agent estimates its gradient from local costs without global consensus, enabling cluster-based parallel updates and accelerated convergence. The method is applied to model-free distributed LQR, with carefully designed local costs and a learning graph to ensure compatibility with global objectives, along with convergence and variance analyses. Simulation results on multi-robot formation formation and scalability tests demonstrate faster convergence and reduced gradient variance compared with centralized ZOO, highlighting practical viability for large networks and privacy preservation.

Abstract

Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and is designed for stochastic non-convex optimization with a possibly non-convex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design, where a learning graph is designed to describe the required interaction relationship among agents in distributed learning. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.

Paper Structure

This paper contains 23 sections, 9 theorems, 102 equations, 5 figures, 1 table, 3 algorithms.

Key Result

Lemma 1

Given $r_i>0$, $i=1,...,N$, the following holds

Figures (5)

  • Figure 1: The architecture of distributed RL via asynchronous actions during two consecutive iterations.
  • Figure 2: Summary of definitions for the cost, sensing, and learning graphs.
  • Figure 3: With the same cost graph, three different sensing graphs result in three different learning graphs. Each node in each graph has a self-loop, which is omitted in this figure.
  • Figure 4: The sensing graph $\mathcal{G}_S$, cost graph $\mathcal{G}_C$ and the resulting learning graph $\mathcal{G}_L$ for the formation control problem.
  • Figure 5: (a) The group performance evolution of a 10-agent formation under the centralized ZOO algorithm, Algorithm \ref{['alg:lqr1']} without clustering and acceleration, without clustering but with acceleration, with clustering but without acceleration, and with both clustering and acceleration. The shaded areas denote the performance corresponding to the controllers obtained by perturbing the current control gain with 50 random samplings. (b) The trajectories of robots under the initial stabilizing controller. (c) The trajectories of robots under the controller learned by Algorithm \ref{['alg:lqr1']} with $s=3$, $w_i^k=0.5$. (d) The performance evolution of a 100-agent formation under Algorithm \ref{['alg:lqr1']}.

Theorems & Definitions (18)

  • Remark 1
  • Lemma 1
  • Lemma 2
  • Remark 2
  • Example 1
  • Theorem 1
  • Lemma 3
  • Lemma 4
  • Definition 1
  • Definition 2
  • ...and 8 more