Table of Contents
Fetching ...

Resource Governance in Networked Systems via Integrated Variational Autoencoders and Reinforcement Learning

Qiliang Chen, Babak Heydari

Abstract

We introduce a framework that integrates variational autoencoders (VAE) with reinforcement learning (RL) to balance system performance and resource usage in multi-agent systems by dynamically adjusting network structures over time. A key innovation of this method is its capability to handle the vast action space of the network structure. This is achieved by combining Variational Auto-Encoder and Deep Reinforcement Learning to control the latent space encoded from the network structures. The proposed method, evaluated on the modified OpenAI particle environment under various scenarios, not only demonstrates superior performance compared to baselines but also reveals interesting strategies and insights through the learned behaviors.

Resource Governance in Networked Systems via Integrated Variational Autoencoders and Reinforcement Learning

Abstract

We introduce a framework that integrates variational autoencoders (VAE) with reinforcement learning (RL) to balance system performance and resource usage in multi-agent systems by dynamically adjusting network structures over time. A key innovation of this method is its capability to handle the vast action space of the network structure. This is achieved by combining Variational Auto-Encoder and Deep Reinforcement Learning to control the latent space encoded from the network structures. The proposed method, evaluated on the modified OpenAI particle environment under various scenarios, not only demonstrates superior performance compared to baselines but also reveals interesting strategies and insights through the learned behaviors.

Paper Structure

This paper contains 22 sections, 8 figures.

Figures (8)

  • Figure 1: The diagram shows a Variational Autoencoder applied to network topology. The encoder processes the adjacency matrix, producing Gaussian distribution parameters. The decoder samples from this distribution to reconstruct the adjacency matrix. Both components use deep neural networks.
  • Figure 2: The diagram depicts the VAE-RL framework used for interaction within a multi-agent system environment. The DDPG manager directly manages the continuous latent variables, which are decoded by the decoder into the reconstructed network topology. This topology serves as the final action within a Partially Observable Markov Decision Process (POMDP). Following this, the multi-agent system environment processes the action and updates its state accordingly. The DDPG Manager then receives observations and rewards, which it uses to update its policy and execute subsequent controls. During this process, the decoder's parameters remain fixed from the pre-trained model.
  • Figure 3: Results show performance and resource penalties for various methods with homogeneous agents (vision ranges 0.6-1.2) and heterogeneous agents in 4-agent systems. A star marker indicates the random policy baseline's overall performance.
  • Figure 4: Results show performance and resource penalties for various methods with homogeneous agents (vision ranges 0.6-1.2) and heterogeneous agents in 10-agent systems. A star marker indicates the random policy baseline's overall performance.
  • Figure 5: Communication network distribution over time is analyzed for homogeneous agents with vision ranges of 0.6, 0.8, 1.0, and 1.2 (subgraphs a-d). Networks are categorized as sparse ($\le$ 9 links), mid-dense (9-17 links), dense (18-26 links), or very dense ($\ge$ 27 links).
  • ...and 3 more figures