Table of Contents
Fetching ...

Emergent Braitenberg-style Behaviours for Navigating the ViZDoom `My Way Home' Labyrinth

Caleidgh Bayer, Robert J. Smith, Malcolm I. Heywood

TL;DR

This paper tackles navigation in a partially observable, high-dimensional labyrinth by exploring whether Braitenberg-style reactive behaviours can emerge from simple, coevolved programs. It contrasts a memoryless DQN baseline with Tangled Program Graphs (TPG), showing that small, modular program graphs that index only a tiny fraction of the state space can achieve robust navigation in ViZDoom's My Way Home task. The results reveal Braitenberg-like policies—derived from context-action program ensembles—without explicit convolutional processing or memory, outperforming the DL baseline under the chosen setup. The work highlights the potential of structured, evolutionary programming to yield simple, interpretable navigation strategies in complex environments and suggests avenues for studying geometry-driven effects and generalization.

Abstract

The navigation of complex labyrinths with tens of rooms under visual partially observable state is typically addressed using recurrent deep reinforcement learning architectures. In this work, we show that navigation can be achieved through the emergent evolution of a simple Braitentberg-style heuristic that structures the interaction between agent and labyrinth, i.e. complex behaviour from simple heuristics. To do so, the approach of tangled program graphs is assumed in which programs cooperatively coevolve to develop a modular indexing scheme that only employs 0.8\% of the state space. We attribute this simplicity to several biases implicit in the representation, such as the use of pixel indexing as opposed to deploying a convolutional kernel or image processing operators.

Emergent Braitenberg-style Behaviours for Navigating the ViZDoom `My Way Home' Labyrinth

TL;DR

This paper tackles navigation in a partially observable, high-dimensional labyrinth by exploring whether Braitenberg-style reactive behaviours can emerge from simple, coevolved programs. It contrasts a memoryless DQN baseline with Tangled Program Graphs (TPG), showing that small, modular program graphs that index only a tiny fraction of the state space can achieve robust navigation in ViZDoom's My Way Home task. The results reveal Braitenberg-like policies—derived from context-action program ensembles—without explicit convolutional processing or memory, outperforming the DL baseline under the chosen setup. The work highlights the potential of structured, evolutionary programming to yield simple, interpretable navigation strategies in complex environments and suggests avenues for studying geometry-driven effects and generalization.

Abstract

The navigation of complex labyrinths with tens of rooms under visual partially observable state is typically addressed using recurrent deep reinforcement learning architectures. In this work, we show that navigation can be achieved through the emergent evolution of a simple Braitentberg-style heuristic that structures the interaction between agent and labyrinth, i.e. complex behaviour from simple heuristics. To do so, the approach of tangled program graphs is assumed in which programs cooperatively coevolve to develop a modular indexing scheme that only employs 0.8\% of the state space. We attribute this simplicity to several biases implicit in the representation, such as the use of pixel indexing as opposed to deploying a convolutional kernel or image processing operators.
Paper Structure (20 sections, 3 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 3 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Map of the 'My Way Home' labyrinth. Room numbers are added for identification purposes alone
  • Figure 2: Illustration of an example TPG champion. $E_{root}$ is the root ensemble from which evaluation always commences, arcs represent context programs ($p_i$) and leafs represent action programs ($a_i$). In this example, there are 4 learners $\in E_{root}$ of which 2 consist of context programs and a corresponding action program ($\langle p_i, a_i \rangle : i \in \{1, 2\}$) and 2 consist of context programs and pointer to the second ensemble, $E_{node}$. All 4 learners in ensemble $E_{node}$ consist of context programs and a corresponding action program ($\langle p_j, a_j \rangle : j \in \{a, \cdots, d\}$)
  • Figure 3: Distribution of rewards under uniform test conditions. 100 spawns at random orientations per room. Room to numerical labels summarized in Figure \ref{['fig:myh_map']}. DQN (a) versus TPG (b). Positive values imply success (negative failure) in reaching the goal
  • Figure 4: Example TPG solution paths for spawn points at (a) room 15 (b) room 22 and (c) room 25. Path colour transitions from red (earliest) to blue (latest) as the agent moves from spawn point to goal (or lost in the labyrinth). See Figure \ref{['fig:myh_map']} for a summary of labyrinth room/corridor labels
  • Figure 5: DQN and TPG paths in the empty room scenario. Neither DQN or TPG agents experienced this environment during training. Red (early) to blue (late) indicates the direction of time in the trajectory
  • ...and 1 more figures