Innate Motivation for Robot Swarms by Minimizing Surprise: From Simple Simulations to Real-World Experiments
Tanja Katharina Kaiser, Heiko Hamann
TL;DR
This paper introduces a task-agnostic intrinsic motivation framework for evolving swarm robotics controllers by minimizing surprise, operationalized as maximizing prediction accuracy of each agent’s sensor data via actor–predictor neural networks evolved with a simple GA. The approach is validated across a progression of environments from simple simulations to realistic simulations and real-robot experiments, demonstrating robust, scalable, and self-organizing emergent behaviors such as self-assembly and basic manipulation. Key findings include strong prediction accuracy leading to coherent emergent patterns, robustness to sensor noise and damage, and partial scalability across swarm densities, with comparative insights against novelty search. The work shows that intrinsic motivation can guide open-ended swarm behavior discovery and offers practical pathways for real-world deployment and future integration with quality-diversity methods like MAP-Elites to balance diversity and high quality.
Abstract
Applications of large-scale mobile multi-robot systems can be beneficial over monolithic robots because of higher potential for robustness and scalability. Developing controllers for multi-robot systems is challenging because the multitude of interactions is hard to anticipate and difficult to model. Automatic design using machine learning or evolutionary robotics seem to be options to avoid that challenge, but bring the challenge of designing reward or fitness functions. Generic reward and fitness functions seem unlikely to exist and task-specific rewards often have undesired side effects. Approaches of so-called innate motivation try to avoid the specific formulation of rewards and work instead with different drivers, such as curiosity. Our approach to innate motivation is to minimize surprise, which we implement by maximizing the accuracy of the swarm robot's sensor predictions using neuroevolution. A unique advantage of the swarm robot case is that swarm members populate the robot's environment and can trigger more active behaviors in a self-referential loop. We summarize our previous simulation-based results concerning behavioral diversity, robustness, scalability, and engineered self-organization, and put them into context. In several new studies, we analyze the influence of the optimizer's hyperparameters, the scalability of evolved behaviors, and the impact of realistic robot simulations. Finally, we present results using real robots that show how the reality gap can be bridged.
