A Deep Reinforcement Learning Approach to Efficient Distributed Optimization

Daokuan Zhu; Tianqi Xu; Jie Lu

A Deep Reinforcement Learning Approach to Efficient Distributed Optimization

Daokuan Zhu, Tianqi Xu, Jie Lu

TL;DR

A learning-based method to achieve efficient distributed optimization over networked systems using a deep reinforcement learning framework for adaptive configuration within a parameterized unifying paradigm, which incorporates an abundance of decentralized first-order and second-order optimization algorithms.

Abstract

In distributed optimization, the practical problem-solving performance is essentially sensitive to algorithm selection, parameter setting, problem type and data pattern. Thus, it is often laborious to acquire a highly efficient method for a given specific problem. In this paper, we propose a learning-based method to achieve efficient distributed optimization over networked systems. Specifically, a deep reinforcement learning (DRL) framework is developed for adaptive configuration within a parameterized unifying algorithmic form, which incorporates an abundance of decentralized first-order and second-order optimization algorithms. We exploit the local consensus and objective information to represent the regularities of problem instances and trace the solving progress, which constitute the states observed by a DRL agent. The framework is trained using Proximal Policy Optimization (PPO) on a number of practical problem instances of similar structures yet different problem data. Experiments on various smooth and non-smooth classes of objective functions demonstrate that our proposed learning-based method outperforms several state-of-the-art distributed optimization algorithms in terms of convergence speed and solution accuracy.

A Deep Reinforcement Learning Approach to Efficient Distributed Optimization

TL;DR

Abstract

Paper Structure (15 sections, 22 equations, 5 figures)

This paper contains 15 sections, 22 equations, 5 figures.

Introduction
Preliminaries on Deep Reinforcement Learning
Markov Decision Process
Policy Gradient
Proximal Policy Optimization
Problem Formulation
Deep Reinforcement Learning Approach
Base Model
Integration with DRL
Numerical Experiments
Linear Least Square Regression with Lasso Regularization
Logistic Regression
Linear $\ell_1$-Regression with Lasso Regularization
Generalization to subsequent iterations
Conclusion

Figures (5)

Figure 1: Interaction paradigm of the learning-based framework within a communication round. Circles marked with $i=1,\dots,N$ represent the computing nodes in the networked system.
Figure 2: (a) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the baseline, the initial policy and the learned policy for solving (\ref{['prob:least_square_lasso_reg']}). (b) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the learned policy and the fixed policy (i.e., $\pi(a^c \mid s)\equiv 1$), as well as the convergence performance of state-of-the-art algorithms applicable to (\ref{['prob:least_square_lasso_reg']}).
Figure 3: (a) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the baseline, the initial policy and the learned policy for solving (\ref{['prob:logistic_reg']}). (b) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the learned policy and the fixed policy (i.e., $\pi(a^c \mid s)\equiv 1$), as well as the convergence performance of SoPro.
Figure 4: (a) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the baseline, the initial policy and the learned policy for solving (\ref{['prob:l1_lasso_reg']}). (b) Convergence performance of base model (\ref{['eqn:param_DAMM_x']})(\ref{['eqn:param_DAMM_q']}) under the learned policy and the fixed policy (i.e., $\pi(a^c \mid s)\equiv 1$), as well as the convergence performance of state-of-the-art algorithms applicable to (\ref{['prob:l1_lasso_reg']}).
Figure 5: Convergence performance of the networked system under the learned policies for a longer time horizon compared with the training stage. "Least square" corresponds to the learned policy for solving (\ref{['prob:least_square_lasso_reg']}), "logistic regression" corresponds to that for solving (\ref{['prob:logistic_reg']}), and "$\ell_1$-regression" corresponds to that for solving (\ref{['prob:l1_lasso_reg']}). The beginning of the prolonged interval is marked with the vertical dashed line in red.

Theorems & Definitions (1)

Remark 1

A Deep Reinforcement Learning Approach to Efficient Distributed Optimization

TL;DR

Abstract

A Deep Reinforcement Learning Approach to Efficient Distributed Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (1)