D2M2N: Decentralized Differentiable Memory-Enabled Mapping and Navigation for Multiple Robots

Md Ishat-E-Rabban; Pratap Tokekar

D2M2N: Decentralized Differentiable Memory-Enabled Mapping and Navigation for Multiple Robots

Md Ishat-E-Rabban, Pratap Tokekar

TL;DR

D2M2N tackles memory limitations in multi-robot navigation by equipping each robot with a differentiable memory that stores a compact occupancy belief as an embedding $m_i^t$ and by using a Value Iteration Network (VIN) as the action selector. The architecture separates memory maintenance (encoder–decoder–aggregator) from planning (VIN), enabling communication of compressed embeddings to neighboring robots and end-to-end differentiability under CTDE. Empirical results show substantial gains over MAGAT, particularly in complex maps and under partial observability, with robustness to moderate sensor noise and improved performance in multi-goal tasks. The approach reduces communication overhead while preserving planning quality, suggesting practical benefits for scalable, decentralized multi-robot systems.

Abstract

Recently, a number of learning-based models have been proposed for multi-robot navigation. However, these models lack memory and only rely on the current observations of the robot to plan their actions. They are unable to leverage past observations to plan better paths, especially in complex environments. In this work, we propose a fully differentiable and decentralized memory-enabled architecture for multi-robot navigation and mapping called D2M2N. D2M2N maintains a compact representation of the environment to remember past observations and uses Value Iteration Network for complex navigation. We conduct extensive experiments to show that D2M2N significantly outperforms the state-of-the-art model in complex mapping and navigation task.

D2M2N: Decentralized Differentiable Memory-Enabled Mapping and Navigation for Multiple Robots

TL;DR

D2M2N tackles memory limitations in multi-robot navigation by equipping each robot with a differentiable memory that stores a compact occupancy belief as an embedding

and by using a Value Iteration Network (VIN) as the action selector. The architecture separates memory maintenance (encoder–decoder–aggregator) from planning (VIN), enabling communication of compressed embeddings to neighboring robots and end-to-end differentiability under CTDE. Empirical results show substantial gains over MAGAT, particularly in complex maps and under partial observability, with robustness to moderate sensor noise and improved performance in multi-goal tasks. The approach reduces communication overhead while preserving planning quality, suggesting practical benefits for scalable, decentralized multi-robot systems.

Abstract

Paper Structure (21 sections, 3 equations, 5 figures, 4 tables)

This paper contains 21 sections, 3 equations, 5 figures, 4 tables.

Introduction
Related Works
Problem Formulation
Architecture
Overview
Memory Maintenance Module
Value Iteration Module
Training
Experiments
Experimental Setup
Compared Algorithms
Evaluation Metric
Dataset
Platform
A Qualitative Example
...and 6 more sections

Figures (5)

Figure 1: Proposed Architecture of D2M2N takes as input local observations and maintains a compact embedding $m_i^t$ of the map. This embedding is updated over time using new local observations and embeddings received from neighboring robots. The planner (VIN) module uses the decoded embedding to select optimal actions for the robot.
Figure 2: Training the encoder, decoder, and aggregator of the MM module.
Figure 3: (a) shows the occupancy grid. Robot 1 is marked as $r_1$, and $g_1$ is the goal location of $r_1$. (b) and (c) show the correct belief map of $r_1$ after 7 and 8 time-steps respectively. This would be the case if no error was incurred during observation and message aggregation. (d) shows the actual belief map of $r_1$ after 7 time-steps. False-positives and false-negatives are marked using red and green boundaries respectively. During the $8^{th}$ time-step, $r_1$ receives a message from $r_2$, which is the encoded version of $r_2$'s belief map after 7 time-steps as shown in (f). After making an observation and receiving a message from $r_2$, $r_1$'s actual belief map after time-step 8 is shown in (e). The VIN module takes $b_1^8$ and the goal location (marked by green square) as input to compute the value map as shown in (g), which is used to select the correct action for $r_1$. In (g), for each cell, we show the softmax probability of the action with the maximum Q-value, which is a measure of the model's confidence about an action. Here red and blue correspond to high and low confidence respectively. Observe that grid-cells located close to the goal cell has high confidence and vice versa.
Figure 4: Instances from Simple (left) and Complex (right) datasets. Green and red blocks represent source and goal cells respectively.
Figure 5: Effect of varying the number of robots (left), communication range (middle), and the size of the receptive field (right).

D2M2N: Decentralized Differentiable Memory-Enabled Mapping and Navigation for Multiple Robots

TL;DR

Abstract

D2M2N: Decentralized Differentiable Memory-Enabled Mapping and Navigation for Multiple Robots

Authors

TL;DR

Abstract

Table of Contents

Figures (5)