Table of Contents
Fetching ...

Multi-agent reinforcement learning strategy to maximize the lifetime of Wireless Rechargeable

Bao Nguyen

TL;DR

The thesis proposes an effective Decentralized Partially Observable Semi-Markov Decision Process (Dec POSMDP) model that promotes Mobile Chargers cooperation and detects optimal charging locations based on realtime network information and allows reinforcement algorithms to be applied to different networks without requiring extensive retraining.

Abstract

The thesis proposes a generalized charging framework for multiple mobile chargers to maximize the network lifetime and ensure target coverage and connectivity in large scale WRSNs. Moreover, a multi-point charging model is leveraged to enhance charging efficiency, where the MC can charge multiple sensors simultaneously at each charging location. The thesis proposes an effective Decentralized Partially Observable Semi-Markov Decision Process (Dec POSMDP) model that promotes Mobile Chargers (MCs) cooperation and detects optimal charging locations based on realtime network information. Furthermore, the proposal allows reinforcement algorithms to be applied to different networks without requiring extensive retraining. To solve the Dec POSMDP model, the thesis proposes an Asynchronous Multi Agent Reinforcement Learning algorithm (AMAPPO) based on the Proximal Policy Optimization algorithm (PPO).

Multi-agent reinforcement learning strategy to maximize the lifetime of Wireless Rechargeable

TL;DR

The thesis proposes an effective Decentralized Partially Observable Semi-Markov Decision Process (Dec POSMDP) model that promotes Mobile Chargers cooperation and detects optimal charging locations based on realtime network information and allows reinforcement algorithms to be applied to different networks without requiring extensive retraining.

Abstract

The thesis proposes a generalized charging framework for multiple mobile chargers to maximize the network lifetime and ensure target coverage and connectivity in large scale WRSNs. Moreover, a multi-point charging model is leveraged to enhance charging efficiency, where the MC can charge multiple sensors simultaneously at each charging location. The thesis proposes an effective Decentralized Partially Observable Semi-Markov Decision Process (Dec POSMDP) model that promotes Mobile Chargers (MCs) cooperation and detects optimal charging locations based on realtime network information. Furthermore, the proposal allows reinforcement algorithms to be applied to different networks without requiring extensive retraining. To solve the Dec POSMDP model, the thesis proposes an Asynchronous Multi Agent Reinforcement Learning algorithm (AMAPPO) based on the Proximal Policy Optimization algorithm (PPO).

Paper Structure

This paper contains 38 sections, 20 equations, 19 figures, 6 tables, 1 algorithm.

Figures (19)

  • Figure 1.1: Architecture of a Wireless Sensor Network
  • Figure 1.2: Sensor node architecture
  • Figure 1.3: An example of a WRSN
  • Figure 1.4: A WRSN with one MC for NLMCTC problem
  • Figure 2.1: In a Markov Decision Process (MDP), an agent interacts with an environment over time. At each timestep $t$, the agent observes the current state of the environment, denoted as $S_t$, and takes an action, denoted as $A_t$. As a result of the agent's action, the environment transitions from state $S_t$ to a new state $S_{t+1}$ and provides the agent with a reward of $R_{t+1}$. This interaction process continues in a loop, with the agent repeatedly observing states, taking actions, and receiving rewards, as represented by the sequence: $S_0, A_0, R_1, S_1, A_1, R_2, S_2, \dots$
  • ...and 14 more figures

Theorems & Definitions (3)

  • Definition 2.2.1
  • Definition 2.2.2
  • Definition 2.2.3