Adaptive Target Localization under Uncertainty using Multi-Agent Deep Reinforcement Learning with Knowledge Transfer

Ahmed Alagha; Rabeb Mizouni; Shakti Singh; Jamal Bentahar; Hadi Otrok

Adaptive Target Localization under Uncertainty using Multi-Agent Deep Reinforcement Learning with Knowledge Transfer

Ahmed Alagha, Rabeb Mizouni, Shakti Singh, Jamal Bentahar, Hadi Otrok

TL;DR

This work tackles target localization under practical uncertainties, such as false alarms and potentially unreachable targets, by developing a Multi-Agent Deep Reinforcement Learning framework guided by Proximal Policy Optimization (PPO) and CNN-based policies. It introduces three action dimensions—Mobility, Detection, and Reachability—together with a team-based reward mechanism and centralized learning with distributed execution (CLDE) to promote cooperation among agents using 2D heatmap-style observations, enhanced by Convolutional AutoEncoder embeddings for environment layouts. When a target is deemed unreachable, a Transfer Learning–augmented DL estimator extrapolates the target coordinates, sharing features with the MADRL policy to reduce computation. The approach is validated in a radiation localization simulation, where it outperforms traditional and other DRL baselines, especially under uncertainty, while maintaining scalable training and low inference latency for real-time deployment.

Abstract

Target localization is a critical task in sensitive applications, where multiple sensing agents communicate and collaborate to identify the target location based on sensor readings. Existing approaches investigated the use of Multi-Agent Deep Reinforcement Learning (MADRL) to tackle target localization. Nevertheless, these methods do not consider practical uncertainties, like false alarms when the target does not exist or when it is unreachable due to environmental complexities. To address these drawbacks, this work proposes a novel MADRL-based method for target localization in uncertain environments. The proposed MADRL method employs Proximal Policy Optimization to optimize the decision-making of sensing agents, which is represented in the form of an actor-critic structure using Convolutional Neural Networks. The observations of the agents are designed in an optimized manner to capture essential information in the environment, and a team-based reward functions is proposed to produce cooperative agents. The MADRL method covers three action dimensionalities that control the agents' mobility to search the area for the target, detect its existence, and determine its reachability. Using the concept of Transfer Learning, a Deep Learning model builds on the knowledge from the MADRL model to accurately estimating the target location if it is unreachable, resulting in shared representations between the models for faster learning and lower computational complexity. Collectively, the final combined model is capable of searching for the target, determining its existence and reachability, and estimating its location accurately. The proposed method is tested using a radioactive target localization environment and benchmarked against existing methods, showing its efficacy.

Adaptive Target Localization under Uncertainty using Multi-Agent Deep Reinforcement Learning with Knowledge Transfer

TL;DR

Abstract

Paper Structure (16 sections, 6 equations, 9 figures, 2 tables)

This paper contains 16 sections, 6 equations, 9 figures, 2 tables.

Introduction
Problem Definition
Related Work
Proposed System
MADRL Formulation and Policy Optimization
Observation Space
Action Space
Policy Networks and Learning Process
Target Estimation with Transfer Learning
Simulation and Evaluation
Simulation Environment
MADRL Performance Analysis
Target Estimation
Benchmarks
Complexity Analysis
...and 1 more sections

Figures (9)

Figure 1: Three examples showing the different scenarios to be addressed by the agents, including cases of (a) complex environments with obstacles, (b) unreachable targets, and (c) no targets due to false alarms.
Figure 2: An overview of the model proposed, which is to be deployed on each sensing agent.
Figure 3: The five collected observations by a sensing agent in a team of three agents. The original observations (top row) are processed to obtain the reduced observations (bottom row). The reduced observations are either local (green) or global (orange).
Figure 4: The actor architecture trained through MADRL (top), and the architecture of the estimation model trained through DL and TL (bottom).
Figure 5: The final model deployed on each of the sensing agents.
...and 4 more figures

Adaptive Target Localization under Uncertainty using Multi-Agent Deep Reinforcement Learning with Knowledge Transfer

TL;DR

Abstract

Adaptive Target Localization under Uncertainty using Multi-Agent Deep Reinforcement Learning with Knowledge Transfer

Authors

TL;DR

Abstract

Table of Contents

Figures (9)