Table of Contents
Fetching ...

DC-Ada: Reward-Only Decentralized Observation-Interface Adaptation for Heterogeneous Multi-Robot Teams

Saad Alqithami

Abstract

Heterogeneity is a defining feature of deployed multi-robot teams: platforms often differ in sensing modalities, ranges, fields of view, and failure patterns. Controllers trained under nominal sensing can degrade sharply when deployed on robots with missing or mismatched sensors, even when the task and action interface are unchanged. We present DC-Ada, a reward-only decentralized adaptation method that keeps a pretrained shared policy frozen and instead adapts compact per-robot observation transforms to map heterogeneous sensing into a fixed inference interface. DC-Ada is gradient-free and communication-minimal: it uses budgeted accept/reject random search with short common-random-number rollouts under a strict step budget. We evaluate DC-Ada against four baselines in a deterministic 2D multi-robot simulator covering warehouse logistics, search and rescue, and collaborative mapping, across four heterogeneity regimes (H0--H3) and five seeds with a matched budget of $200{,}000$ joint environment steps per run. Results show that heterogeneity can substantially degrade a frozen shared policy and that no single mitigation dominates across all tasks and metrics. Observation normalization is strongest for reward robustness in warehouse logistics and competitive in search and rescue, while the frozen shared policy is strongest for reward in collaborative mapping. DC-Ada offers a useful complementary operating point: it improves completion most clearly in severe coverage-based mapping while requiring only scalar team returns and no policy fine-tuning or persistent communication. These results position DC-Ada as a practical deploy-time adaptation method for heterogeneous teams.

DC-Ada: Reward-Only Decentralized Observation-Interface Adaptation for Heterogeneous Multi-Robot Teams

Abstract

Heterogeneity is a defining feature of deployed multi-robot teams: platforms often differ in sensing modalities, ranges, fields of view, and failure patterns. Controllers trained under nominal sensing can degrade sharply when deployed on robots with missing or mismatched sensors, even when the task and action interface are unchanged. We present DC-Ada, a reward-only decentralized adaptation method that keeps a pretrained shared policy frozen and instead adapts compact per-robot observation transforms to map heterogeneous sensing into a fixed inference interface. DC-Ada is gradient-free and communication-minimal: it uses budgeted accept/reject random search with short common-random-number rollouts under a strict step budget. We evaluate DC-Ada against four baselines in a deterministic 2D multi-robot simulator covering warehouse logistics, search and rescue, and collaborative mapping, across four heterogeneity regimes (H0--H3) and five seeds with a matched budget of joint environment steps per run. Results show that heterogeneity can substantially degrade a frozen shared policy and that no single mitigation dominates across all tasks and metrics. Observation normalization is strongest for reward robustness in warehouse logistics and competitive in search and rescue, while the frozen shared policy is strongest for reward in collaborative mapping. DC-Ada offers a useful complementary operating point: it improves completion most clearly in severe coverage-based mapping while requiring only scalar team returns and no policy fine-tuning or persistent communication. These results position DC-Ada as a practical deploy-time adaptation method for heterogeneous teams.

Paper Structure

This paper contains 70 sections, 2 theorems, 22 equations, 8 figures, 9 tables, 1 algorithm.

Key Result

Lemma 1

If $\mathrm{Cov}(F_T(\phi',\xi),F_T(\phi,\xi))>0$, then where $\xi$ and $\xi'$ are independent. $\blacktriangleleft$$\blacktriangleleft$

Figures (8)

  • Figure 1: DC-Ada deployment and adaptation interface. Each robot adapts only a local observation transform $g_{\phi_i}$ in front of a shared frozen policy $\pi_\theta$. Adaptation uses scalar rollout return broadcasts $R(\tau)$ (once per rollout) and does not require sharing raw observations, maps, or gradients.
  • Figure 2: Merged environment/heterogeneity illustration: (a) severe heterogeneity rendered scene (conceptual), (b) sensor configuration examples motivating heterogeneity, and (c) top-down arena layout for the warehouse-style environment used in our simulator.
  • Figure 3: Scaling with heterogeneity level (H0--H3) for the three tasks. Curves show mean reward under a fixed interaction budget ($B{=}200{,}000$ joint environment steps). A flatter slope indicates greater robustness to sensor mismatch for that task.
  • Figure 4: Compact per-domain reward summary by method and heterogeneity level (H0--H3). Bars are averaged over five seeds under the same matched budget of $B{=}200{,}000$ joint environment steps. This figure complements Fig. \ref{['fig:scaling']} by making within-level method ranking explicit.
  • Figure 5: Success rate by method and heterogeneity level (H0--H3) using the thresholds in Table \ref{['tab:setup']}. Success is a binary completion metric; Sec. \ref{['sec:results_h3']} reports continuous progress metrics and threshold sensitivity under severe heterogeneity.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Lemma 1: CRN reduces the variance of rollout differences
  • Proposition 1: False-accept probability under bounded returns
  • proof