Table of Contents
Fetching ...

A Deep Reinforcement Learning Approach to Efficient Distributed Optimization

Daokuan Zhu, Tianqi Xu, Jie Lu

TL;DR

A learning-based method to achieve efficient distributed optimization over networked systems using a deep reinforcement learning framework for adaptive configuration within a parameterized unifying paradigm, which incorporates an abundance of decentralized first-order and second-order optimization algorithms.

Abstract

In distributed optimization, the practical problem-solving performance is essentially sensitive to algorithm selection, parameter setting, problem type and data pattern. Thus, it is often laborious to acquire a highly efficient method for a given specific problem. In this paper, we propose a learning-based method to achieve efficient distributed optimization over networked systems. Specifically, a deep reinforcement learning (DRL) framework is developed for adaptive configuration within a parameterized unifying algorithmic form, which incorporates an abundance of decentralized first-order and second-order optimization algorithms. We exploit the local consensus and objective information to represent the regularities of problem instances and trace the solving progress, which constitute the states observed by a DRL agent. The framework is trained using Proximal Policy Optimization (PPO) on a number of practical problem instances of similar structures yet different problem data. Experiments on various smooth and non-smooth classes of objective functions demonstrate that our proposed learning-based method outperforms several state-of-the-art distributed optimization algorithms in terms of convergence speed and solution accuracy.

A Deep Reinforcement Learning Approach to Efficient Distributed Optimization

TL;DR

A learning-based method to achieve efficient distributed optimization over networked systems using a deep reinforcement learning framework for adaptive configuration within a parameterized unifying paradigm, which incorporates an abundance of decentralized first-order and second-order optimization algorithms.

Abstract

In distributed optimization, the practical problem-solving performance is essentially sensitive to algorithm selection, parameter setting, problem type and data pattern. Thus, it is often laborious to acquire a highly efficient method for a given specific problem. In this paper, we propose a learning-based method to achieve efficient distributed optimization over networked systems. Specifically, a deep reinforcement learning (DRL) framework is developed for adaptive configuration within a parameterized unifying algorithmic form, which incorporates an abundance of decentralized first-order and second-order optimization algorithms. We exploit the local consensus and objective information to represent the regularities of problem instances and trace the solving progress, which constitute the states observed by a DRL agent. The framework is trained using Proximal Policy Optimization (PPO) on a number of practical problem instances of similar structures yet different problem data. Experiments on various smooth and non-smooth classes of objective functions demonstrate that our proposed learning-based method outperforms several state-of-the-art distributed optimization algorithms in terms of convergence speed and solution accuracy.
Paper Structure (15 sections, 22 equations, 5 figures)

This paper contains 15 sections, 22 equations, 5 figures.

Figures (5)

  • Figure 1: Interaction paradigm of the learning-based framework within a communication round. Circles marked with $i=1,\dots,N$ represent the computing nodes in the networked system.
  • Figure 2: (a) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the baseline, the initial policy and the learned policy for solving (\ref{['prob:least_square_lasso_reg']}). (b) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the learned policy and the fixed policy (i.e., $\pi(a^c \mid s)\equiv 1$), as well as the convergence performance of state-of-the-art algorithms applicable to (\ref{['prob:least_square_lasso_reg']}).
  • Figure 3: (a) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the baseline, the initial policy and the learned policy for solving (\ref{['prob:logistic_reg']}). (b) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the learned policy and the fixed policy (i.e., $\pi(a^c \mid s)\equiv 1$), as well as the convergence performance of SoPro.
  • Figure 4: (a) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the baseline, the initial policy and the learned policy for solving (\ref{['prob:l1_lasso_reg']}). (b) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the learned policy and the fixed policy (i.e., $\pi(a^c \mid s)\equiv 1$), as well as the convergence performance of state-of-the-art algorithms applicable to (\ref{['prob:l1_lasso_reg']}).
  • Figure 5: Convergence performance of the networked system under the learned policies for a longer time horizon compared with the training stage. "Least square" corresponds to the learned policy for solving (\ref{['prob:least_square_lasso_reg']}), "logistic regression" corresponds to that for solving (\ref{['prob:logistic_reg']}), and "$\ell_1$-regression" corresponds to that for solving (\ref{['prob:l1_lasso_reg']}). The beginning of the prolonged interval is marked with the vertical dashed line in red.

Theorems & Definitions (1)

  • Remark 1