Table of Contents
Fetching ...

Innate Motivation for Robot Swarms by Minimizing Surprise: From Simple Simulations to Real-World Experiments

Tanja Katharina Kaiser, Heiko Hamann

TL;DR

This paper introduces a task-agnostic intrinsic motivation framework for evolving swarm robotics controllers by minimizing surprise, operationalized as maximizing prediction accuracy of each agent’s sensor data via actor–predictor neural networks evolved with a simple GA. The approach is validated across a progression of environments from simple simulations to realistic simulations and real-robot experiments, demonstrating robust, scalable, and self-organizing emergent behaviors such as self-assembly and basic manipulation. Key findings include strong prediction accuracy leading to coherent emergent patterns, robustness to sensor noise and damage, and partial scalability across swarm densities, with comparative insights against novelty search. The work shows that intrinsic motivation can guide open-ended swarm behavior discovery and offers practical pathways for real-world deployment and future integration with quality-diversity methods like MAP-Elites to balance diversity and high quality.

Abstract

Applications of large-scale mobile multi-robot systems can be beneficial over monolithic robots because of higher potential for robustness and scalability. Developing controllers for multi-robot systems is challenging because the multitude of interactions is hard to anticipate and difficult to model. Automatic design using machine learning or evolutionary robotics seem to be options to avoid that challenge, but bring the challenge of designing reward or fitness functions. Generic reward and fitness functions seem unlikely to exist and task-specific rewards often have undesired side effects. Approaches of so-called innate motivation try to avoid the specific formulation of rewards and work instead with different drivers, such as curiosity. Our approach to innate motivation is to minimize surprise, which we implement by maximizing the accuracy of the swarm robot's sensor predictions using neuroevolution. A unique advantage of the swarm robot case is that swarm members populate the robot's environment and can trigger more active behaviors in a self-referential loop. We summarize our previous simulation-based results concerning behavioral diversity, robustness, scalability, and engineered self-organization, and put them into context. In several new studies, we analyze the influence of the optimizer's hyperparameters, the scalability of evolved behaviors, and the impact of realistic robot simulations. Finally, we present results using real robots that show how the reality gap can be bridged.

Innate Motivation for Robot Swarms by Minimizing Surprise: From Simple Simulations to Real-World Experiments

TL;DR

This paper introduces a task-agnostic intrinsic motivation framework for evolving swarm robotics controllers by minimizing surprise, operationalized as maximizing prediction accuracy of each agent’s sensor data via actor–predictor neural networks evolved with a simple GA. The approach is validated across a progression of environments from simple simulations to realistic simulations and real-robot experiments, demonstrating robust, scalable, and self-organizing emergent behaviors such as self-assembly and basic manipulation. Key findings include strong prediction accuracy leading to coherent emergent patterns, robustness to sensor noise and damage, and partial scalability across swarm densities, with comparative insights against novelty search. The work shows that intrinsic motivation can guide open-ended swarm behavior discovery and offers practical pathways for real-world deployment and future integration with quality-diversity methods like MAP-Elites to balance diversity and high quality.

Abstract

Applications of large-scale mobile multi-robot systems can be beneficial over monolithic robots because of higher potential for robustness and scalability. Developing controllers for multi-robot systems is challenging because the multitude of interactions is hard to anticipate and difficult to model. Automatic design using machine learning or evolutionary robotics seem to be options to avoid that challenge, but bring the challenge of designing reward or fitness functions. Generic reward and fitness functions seem unlikely to exist and task-specific rewards often have undesired side effects. Approaches of so-called innate motivation try to avoid the specific formulation of rewards and work instead with different drivers, such as curiosity. Our approach to innate motivation is to minimize surprise, which we implement by maximizing the accuracy of the swarm robot's sensor predictions using neuroevolution. A unique advantage of the swarm robot case is that swarm members populate the robot's environment and can trigger more active behaviors in a self-referential loop. We summarize our previous simulation-based results concerning behavioral diversity, robustness, scalability, and engineered self-organization, and put them into context. In several new studies, we analyze the influence of the optimizer's hyperparameters, the scalability of evolved behaviors, and the impact of realistic robot simulations. Finally, we present results using real robots that show how the reality gap can be bridged.
Paper Structure (39 sections, 7 equations, 23 figures, 5 tables)

This paper contains 39 sections, 7 equations, 23 figures, 5 tables.

Figures (23)

  • Figure 1: Overview of this paper's studies. We showcase our minimize surprise approach in scenarios of increasing complexity: simple simulations, realistic simulations, and on real robot hardware. In our simple simulations, we allow different degrees of freedom (DOF) for behaviors and observe major advantages, such as behavior diversity, scalability, and robustness. Experiments in realistic simulations are run in environments with and without manipulable boxes. We conclude with real-world multi-robot experiments. Studies presented for the first time in this paper are marked with a * kaiser2021a.
  • Figure 2: Actor-predictor ANN pair of each swarm member in minimize surprise. The actor (a) is a single hidden layer feedforward ANN that outputs M action values $a_0(t), \dots, a_{M-1}(t)$ (e.g. motor speeds). The predictor (b) has one recurrent hidden layer and outputs R sensor value predictions ${p_0(t+1),\dots,p_{R-1}(t+1)}$ for time step $t+1$. Inputs are R sensor values $s_0(t),\dots,s_{R-1}(t)$ at time step t and all or a subset of the action values $a_0(t-1), \dots, a_{M-1}(t-1)$ of time step $t-1$ or $a_0(t), \dots, a_{M-1}(t)$ of time step $t$, respectively kaiser2021a.
  • Figure 3: Sensor model for the simple simulations (Sec. \ref{['section:AnalysisApproach']}) with labels per sensor. The gray circle represents the agent and the arrow its heading kaiser2021a.
  • Figure 4: Examples of emergent patterns in the self-assembly scenario: (a) lines, (b) pairs, (c) squares, (d) triangular lattice, (e) random dispersion, (f) aggregation, (g) clustering, (h) loose grouping, and (i) swirls. Agents and their headings are represented by black triangles. Green boxes represent arbitrary examples of the respective pattern to guide the eye kaiser2021a.
  • Figure 5: Best fitness F of 50 independent minimize surprise runs (a) over generations on the $12\times 12$ grid ($L=12$) and (b) for the last generation per grid size $L$ as well as (c) solution quality (i.e., percentage of agents positioned in dominant structure) of best individual per grid size $L$. Medians are indicated by the red barskaiser2021a.
  • ...and 18 more figures