Redefining Robot Generalization Through Interactive Intelligence
Sharmita Dey
TL;DR
The paper argues that robot foundation models have been limited by single-agent autonomy, particularly in semi-autonomous and wearable robotics where continuous human collaboration is essential. It proposes a neuroscience-inspired, four-module architecture that treats humans and devices as co-adapting agents: a sensing module for multimodal inputs and language, an ad-hoc teamwork module for intent and belief-state modeling, a predictive world belief model for anticipatory control, and a memory/feedback module for long-term personalization. The approach emphasizes offline pre-training on diverse data, in-situ fine-tuning, safety, and interpretability, with extensions to a range of human-in-the-loop scenarios beyond cyborgs. If realized, this interactive multi-agent framework could yield more robust, personalized, and anticipatory performance in wearable robotics, prosthetics, exoskeletons, and related semi-autonomous systems.
Abstract
Recent advances in large-scale machine learning have produced high-capacity foundation models capable of adapting to a broad array of downstream tasks. While such models hold great promise for robotics, the prevailing paradigm still portrays robots as single, autonomous decision-makers, performing tasks like manipulation and navigation, with limited human involvement. However, a large class of real-world robotic systems, including wearable robotics (e.g., prostheses, orthoses, exoskeletons), teleoperation, and neural interfaces, are semiautonomous, and require ongoing interactive coordination with human partners, challenging single-agent assumptions. In this position paper, we argue that robot foundation models must evolve to an interactive multi-agent perspective in order to handle the complexities of real-time human-robot co-adaptation. We propose a generalizable, neuroscience-inspired architecture encompassing four modules: (1) a multimodal sensing module informed by sensorimotor integration principles, (2) an ad-hoc teamwork model reminiscent of joint-action frameworks in cognitive science, (3) a predictive world belief model grounded in internal model theories of motor control, and (4) a memory/feedback mechanism that echoes concepts of Hebbian and reinforcement-based plasticity. Although illustrated through the lens of cyborg systems, where wearable devices and human physiology are inseparably intertwined, the proposed framework is broadly applicable to robots operating in semi-autonomous or interactive contexts. By moving beyond single-agent designs, our position emphasizes how foundation models in robotics can achieve a more robust, personalized, and anticipatory level of performance.
