Table of Contents
Fetching ...

Trust-Based Social Learning for Communication (TSLEC) Protocol Evolution in Multi-Agent Reinforcement Learning

Abraham Itzhak Weinberg

TL;DR

TSLEC addresses the problem of slow, independent emergence of communication protocols in MARL by introducing explicit, trust-modulated peer teaching. The method integrates Q-learning with emergent communication, dynamic trust networks, and mission adaptation to accelerate convergence while yielding compositional languages that remain robust under changing objectives. Key findings include a $23.9\%$ reduction in episodes-to-convergence, $\mathcal{C}=0.38$, $\Phi>0.867$, and a strong $r=0.743$ correlation between trust and teaching effectiveness, indicating effective knowledge filtering. This work demonstrates that explicit social learning can fundamentally speed up coordination in multi-agent systems and informs design principles for scalable, interpretable, and safe AI.

Abstract

Emergent communication in multi-agent systems typically occurs through independent learning, resulting in slow convergence and potentially suboptimal protocols. We introduce TSLEC (Trust-Based Social Learning with Emergent Communication), a framework where agents explicitly teach successful strategies to peers, with knowledge transfer modulated by learned trust relationships. Through experiments with 100 episodes across 30 random seeds, we demonstrate that trust-based social learning reduces episodes-to-convergence by 23.9% (p < 0.001, Cohen's d = 1.98) compared to independent emergence, while producing compositional protocols (C = 0.38) that remain robust under dynamic objectives (Phi > 0.867 decoding accuracy). Trust scores strongly correlate with teaching quality (r = 0.743, p < 0.001), enabling effective knowledge filtering. Our results establish that explicit social learning fundamentally accelerates emergent communication in multi-agent coordination.

Trust-Based Social Learning for Communication (TSLEC) Protocol Evolution in Multi-Agent Reinforcement Learning

TL;DR

TSLEC addresses the problem of slow, independent emergence of communication protocols in MARL by introducing explicit, trust-modulated peer teaching. The method integrates Q-learning with emergent communication, dynamic trust networks, and mission adaptation to accelerate convergence while yielding compositional languages that remain robust under changing objectives. Key findings include a reduction in episodes-to-convergence, , , and a strong correlation between trust and teaching effectiveness, indicating effective knowledge filtering. This work demonstrates that explicit social learning can fundamentally speed up coordination in multi-agent systems and informs design principles for scalable, interpretable, and safe AI.

Abstract

Emergent communication in multi-agent systems typically occurs through independent learning, resulting in slow convergence and potentially suboptimal protocols. We introduce TSLEC (Trust-Based Social Learning with Emergent Communication), a framework where agents explicitly teach successful strategies to peers, with knowledge transfer modulated by learned trust relationships. Through experiments with 100 episodes across 30 random seeds, we demonstrate that trust-based social learning reduces episodes-to-convergence by 23.9% (p < 0.001, Cohen's d = 1.98) compared to independent emergence, while producing compositional protocols (C = 0.38) that remain robust under dynamic objectives (Phi > 0.867 decoding accuracy). Trust scores strongly correlate with teaching quality (r = 0.743, p < 0.001), enabling effective knowledge filtering. Our results establish that explicit social learning fundamentally accelerates emergent communication in multi-agent coordination.

Paper Structure

This paper contains 12 sections, 3 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Learning curves showing mean reward per episode with 95% confidence intervals. TSLEC (green) converges fastest, reaching near-optimal performance by episode 50. No Teaching (blue) shows slower improvement. Independent QL (red) plateaus at substantially lower reward.
  • Figure 2: Vocabulary size evolution showing rapid growth in episodes 1-30, then gradual expansion to convergence around 38 symbols. Error bands show standard deviation across 30 seeds.
  • Figure 3: Compositionality score evolution showing increasing systematic structure. Full System (green) and No Adaptation (orange) achieve higher final compositionality than No Teaching (blue), indicating social learning pressure for interpretable encodings.
  • Figure 4: Average trust score evolution showing convergence to high mutual trust ($\tau \approx 0.80$) by episode 50. Full System and No Adaptation exhibit nearly identical dynamics.
  • Figure 5: Performance distribution across conditions. Box plots show final reward (last 10 episodes). Overlapping distributions for Full System and No Adaptation explain non-significant difference.
  • ...and 2 more figures