Active Asymmetric Multi-Agent Multimodal Learning under Uncertainty

Rui Liu; Pratap Tokekar; Ming Lin

Active Asymmetric Multi-Agent Multimodal Learning under Uncertainty

Rui Liu, Pratap Tokekar, Ming Lin

TL;DR

The paper tackles robust multi-agent multimodal learning under modality-specific uncertainty, a setting common in connected autonomous driving. It proposes Active Asymmetric MAML under Uncertainty (A2MAML), a three-stage framework—$ ext{(i)}$ stochastic local encoding yielding $(oldsymbol{f}_{i,m}, oldsymbol{u}_{i,m})$, $ ext{(ii)}$ uncertainty-guided active selection producing $ ho_{i,m}$ and a differentiable Accept/Reject policy, and $ ext{(iii)}$ asymmetric Bayesian aggregation with inverse-variance weighting $oldsymbol{f} = \frac{\sum Z_{i,m}\omega_{i,m}\boldsymbol{f}_{i,m}}{\sum Z_{i,m}\omega_{i,m}}$ where $\\omega_{i,m}=\exp(-\boldsymbol{u}_{i,m})$. Evaluated on AUTOCASTSIM accident-prone scenarios, A2MAML yields up to $18.7\%$ higher mean ADR than baselines and reduces communication load through selective sharing, demonstrating robustness to sensor corruption and the practicality of modality-level uncertainty-driven fusion.

Abstract

Multi-agent systems are increasingly equipped with heterogeneous multimodal sensors, enabling richer perception but introducing modality-specific and agent-dependent uncertainty. Existing multi-agent collaboration frameworks typically reason at the agent level, assume homogeneous sensing, and handle uncertainty implicitly, limiting robustness under sensor corruption. We propose Active Asymmetric Multi-Agent Multimodal Learning under Uncertainty (A2MAML), a principled approach for uncertainty-aware, modality-level collaboration. A2MAML models each modality-specific feature as a stochastic estimate with uncertainty prediction, actively selects reliable agent-modality pairs, and aggregates information via Bayesian inverse-variance weighting. This formulation enables fine-grained, modality-level fusion, supports asymmetric modality availability, and provides a principled mechanism to suppress corrupted or noisy modalities. Extensive experiments on connected autonomous driving scenarios for collaborative accident detection demonstrate that A2MAML consistently outperforms both single-agent and collaborative baselines, achieving up to 18.7% higher accident detection rate.

Active Asymmetric Multi-Agent Multimodal Learning under Uncertainty

TL;DR

stochastic local encoding yielding

uncertainty-guided active selection producing

and a differentiable Accept/Reject policy, and

asymmetric Bayesian aggregation with inverse-variance weighting

where

. Evaluated on AUTOCASTSIM accident-prone scenarios, A2MAML yields up to

higher mean ADR than baselines and reduces communication load through selective sharing, demonstrating robustness to sensor corruption and the practicality of modality-level uncertainty-driven fusion.

Abstract

Paper Structure (23 sections, 8 equations, 4 figures, 2 tables)

This paper contains 23 sections, 8 equations, 4 figures, 2 tables.

Introduction
Related Work
Multi-Agent Collaboration
Multimodal Learning
Learning under Uncertainty in Collaborative Systems
Approach
Problem Formulation
Active Asymmetric MAML under Uncertainty
Stochastic Local Encoding.
Uncertainty-Guided Active Selection.
Asymmetric Bayesian Aggregation.
Experiments
Data Collection
Experimental Setup
Baselines
...and 8 more sections

Figures (4)

Figure 1: Overview of the A2MAML pipeline.$A_0$ denotes the ego agent, while $A_1$ to $A_N$ represent collaborative agents. The framework consists of three stages. In the first stage, given corrupted raw sensor observations $\mathbf{x}_{i,m}$ from agent $i$ under modality $m$, a Gaussian feature representation is extracted, comprising a feature embedding $\mathbf{f}_{i,m}$ and an uncertainty representation $\mathbf{u}_{i,m}$. In the second stage, a scalar uncertainty token $\rho_{i,m}$ is obtained via global average pooling over $\mathbf{u}_{i,m}$, serving as a compact estimate of the sensor’s noise level. A lightweight selection policy $\pi_\theta$ takes the ego agent’s uncertainty token $\rho_{0,m}$ and a collaborative agent’s token $\rho_{i,m}$ as input to determine whether to accept or reject that modality. In the third stage, the selected feature embeddings are fused via Bayesian aggregation and passed through a prediction head to produce the ego agent’s final prediction.
Figure 2: Accident-prone traffic scenarios in the AUTOCASTSIM benchmark for connected autonomous vehicles, including overtaking, left turn, and red light violation qiu2021autocastliu2025caml. Additional details for each scenario can be found in Appendix \ref{['app:data']}.
Figure 3: Qualitative analysis of active selection. An illustrative example demonstrates the effect of the proposed selection mechanism. The ego vehicle’s sensor data and Collaborator 1’s RGB observations are clean, while Collaborator 2’s RGB input is corrupted by Gaussian noise, resulting in a higher uncertainty token $\rho$. Consequently, the selection policy rejects the corrupted modality from Collaborator 2 while accepting the reliable features from Collaborator 1.
Figure 4: Performance analysis on ADR under varying noise levels. We evaluate robustness against the strong baseline V2X-ViT xu2022v2x, by injecting sensor noise with corruption probabilities $p=\{0.3, 0.5, 0.7\}$ during training and testing. A2MAML consistently outperforms V2X-ViT across all noise levels and exhibits smaller performance degradation as noise increases, demonstrating improved robustness from modality-level active selection and uncertainty-aware Bayesian fusion.

Active Asymmetric Multi-Agent Multimodal Learning under Uncertainty

TL;DR

Abstract

Active Asymmetric Multi-Agent Multimodal Learning under Uncertainty

Authors

TL;DR

Abstract

Table of Contents

Figures (4)