An Attention Mechanism for Robust Multimodal Integration in a Global Workspace Architecture
Roland Bertin-Johannet, Lara Scipio, Leopold Maytié, Rufin VanRullen
TL;DR
The paper tackles robustness in multimodal fusion by introducing a Global Workspace (GW) architecture augmented with a top-down modality attention mechanism. By freezing a pretrained multimodal workspace and adding a lightweight attention controller, the approach learns to re-weight modalities under varying reliability without retraining the entire system, using a set-to-set broadcast formulation and a mix of translation, demi-cycle, cycle, and contrastive objectives. Empirical results on Simple Shapes and MM-IMDb 1.0 demonstrate improved noise robustness, strong cross-task generalization, and competitive performance on a real-world benchmark with favorable training efficiency. The work advances practical multimodal AI by enabling flexible, transferable modality selection within a GW framework, with potential extensions to dynamic data and additional modalities.
Abstract
Global Workspace Theory (GWT), inspired by cognitive neuroscience, posits that flexible cognition could arise via the attentional selection of a relevant subset of modalities within a multimodal integration system. This cognitive framework can inspire novel computational architectures for multimodal integration. Indeed, recent implementations of GWT have explored its multimodal representation capabilities, but the related attention mechanisms remain understudied. Here, we propose and evaluate a top-down attention mechanism to select modalities inside a global workspace. First, we demonstrate that our attention mechanism improves noise robustness of a global workspace system on two multimodal datasets of increasing complexity: Simple Shapes and MM-IMDb 1.0. Second, we highlight various cross-task and cross-modality generalization capabilities that are not shared by multimodal attention models from the literature. Comparing against existing baselines on the MM-IMDb 1.0 benchmark, we find our attention mechanism makes the global workspace competitive with the state of the art.
