Table of Contents
Fetching ...

Graceful task adaptation with a bi-hemispheric RL agent

Grant Nicholas, Levin Kuhlmann, Gideon Kowadlo

TL;DR

This work introduces a bi-hemispheric reinforcement learning agent inspired by the Novelty-Routine Hypothesis, assigning a generalist right-hemisphere and a specialist left-hemisphere to tackle novel tasks. A gating network dynamically allocates responsibility between hemispheres, with a loss term that encourages shifting control toward the left as tasks are learned and right-hemisphere competency grows, all under a GRU-based architecture. The right hemisphere is meta-trained with RL$^2$ to promote generalisation, while the left hemisphere learns task-specific policies; experiments on Meta-world show improved initial performance on some tasks when the right hemisphere is competent, but results are variable across tasks and novelty levels. The study discusses improvements to meta-training, potential decoupling of gating from left-hemisphere training, and extensions toward continual learning, highlighting the approach’s potential to enable rapid adaptation with a structured, neuroscience-inspired inductive bias.

Abstract

In humans, responsibility for performing a task gradually shifts from the right hemisphere to the left. The Novelty-Routine Hypothesis (NRH) states that the right and left hemispheres are used to perform novel and routine tasks respectively, enabling us to learn a diverse range of novel tasks while performing the task capably. Drawing on the NRH, we develop a reinforcement learning agent with specialised hemispheres that can exploit generalist knowledge from the right-hemisphere to avoid poor initial performance on novel tasks. In addition, we find that this design has minimal impact on its ability to learn novel tasks. We conclude by identifying improvements to our agent and exploring potential expansion to the continual learning setting.

Graceful task adaptation with a bi-hemispheric RL agent

TL;DR

This work introduces a bi-hemispheric reinforcement learning agent inspired by the Novelty-Routine Hypothesis, assigning a generalist right-hemisphere and a specialist left-hemisphere to tackle novel tasks. A gating network dynamically allocates responsibility between hemispheres, with a loss term that encourages shifting control toward the left as tasks are learned and right-hemisphere competency grows, all under a GRU-based architecture. The right hemisphere is meta-trained with RL to promote generalisation, while the left hemisphere learns task-specific policies; experiments on Meta-world show improved initial performance on some tasks when the right hemisphere is competent, but results are variable across tasks and novelty levels. The study discusses improvements to meta-training, potential decoupling of gating from left-hemisphere training, and extensions toward continual learning, highlighting the approach’s potential to enable rapid adaptation with a structured, neuroscience-inspired inductive bias.

Abstract

In humans, responsibility for performing a task gradually shifts from the right hemisphere to the left. The Novelty-Routine Hypothesis (NRH) states that the right and left hemispheres are used to perform novel and routine tasks respectively, enabling us to learn a diverse range of novel tasks while performing the task capably. Drawing on the NRH, we develop a reinforcement learning agent with specialised hemispheres that can exploit generalist knowledge from the right-hemisphere to avoid poor initial performance on novel tasks. In addition, we find that this design has minimal impact on its ability to learn novel tasks. We conclude by identifying improvements to our agent and exploring potential expansion to the continual learning setting.
Paper Structure (24 sections, 6 equations, 5 figures)

This paper contains 24 sections, 6 equations, 5 figures.

Figures (5)

  • Figure 1: Experiment overview
  • Figure 2: Training curves. The thick line indicates the median of all seed runs, while the shaded ribbon covers the range of min and max values.
  • Figure 3: Initial bi-hemispheric agent performance relative to left-only baseline: IRR distribution over seeds. IRR scores of greater than 1 indicate that initial bi-hemispheric performance exceeds the left-only baseline. Median IRR scores which exceed 1 are shown in blue, while those below 1 are in red. The black line at $IRR=1$ indicates parity with left-only baseline performance.
  • Figure 4: Final left-hemisphere performance relative to left-only baseline: FRR distribution over seeds.
  • Figure 5: Comparison of IRR vs FRR. Black lines at $IRR=1$ and $FRR=1$ divide the plot into quadrants. The upper right quadrant is the ideal outcome where IRR and FRR exceed left-only baseline performance and the bi-hemispheric agent achieves Objectives 1 and 2. The lower-left quadrant indicates failure to achieve either objective.