Table of Contents
Fetching ...

Heterogeneous Knowledge for Augmented Modular Reinforcement Learning

Lorenz Wolf, Mirco Musolesi

TL;DR

This framework uses a selector to combine heterogeneous modules and seamlessly incorporate different types of knowledge representations and processing mechanisms and examines the safety, robustness, and interpretability issues stemming from the introduction of knowledge heterogeneity.

Abstract

Existing modular Reinforcement Learning (RL) architectures are generally based on reusable components, also allowing for "plug-and-play" integration. However, these modules are homogeneous in nature - in fact, they essentially provide policies obtained via RL through the maximization of individual reward functions. Consequently, such solutions still lack the ability to integrate and process multiple types of information (i.e., heterogeneous knowledge representations), such as rules, sub-goals, and skills from various sources. In this paper, we discuss several practical examples of heterogeneous knowledge and propose Augmented Modular Reinforcement Learning (AMRL) to address these limitations. Our framework uses a selector to combine heterogeneous modules and seamlessly incorporate different types of knowledge representations and processing mechanisms. Our results demonstrate the performance and efficiency improvements, also in terms of generalization, that can be achieved by augmenting traditional modular RL with heterogeneous knowledge sources and processing mechanisms. Finally, we examine the safety, robustness, and interpretability issues stemming from the introduction of knowledge heterogeneity.

Heterogeneous Knowledge for Augmented Modular Reinforcement Learning

TL;DR

This framework uses a selector to combine heterogeneous modules and seamlessly incorporate different types of knowledge representations and processing mechanisms and examines the safety, robustness, and interpretability issues stemming from the introduction of knowledge heterogeneity.

Abstract

Existing modular Reinforcement Learning (RL) architectures are generally based on reusable components, also allowing for "plug-and-play" integration. However, these modules are homogeneous in nature - in fact, they essentially provide policies obtained via RL through the maximization of individual reward functions. Consequently, such solutions still lack the ability to integrate and process multiple types of information (i.e., heterogeneous knowledge representations), such as rules, sub-goals, and skills from various sources. In this paper, we discuss several practical examples of heterogeneous knowledge and propose Augmented Modular Reinforcement Learning (AMRL) to address these limitations. Our framework uses a selector to combine heterogeneous modules and seamlessly incorporate different types of knowledge representations and processing mechanisms. Our results demonstrate the performance and efficiency improvements, also in terms of generalization, that can be achieved by augmenting traditional modular RL with heterogeneous knowledge sources and processing mechanisms. Finally, we examine the safety, robustness, and interpretability issues stemming from the introduction of knowledge heterogeneity.
Paper Structure (44 sections, 13 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 44 sections, 13 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: Examples of heterogeneous knowledge and the AMRL architecture. AMRL is able to access several sources of heterogeneous knowledge via modules. The modules can then be updated based on the environment feedback.
  • Figure 2: The achieved reward logged throughout training. On all 4 environments AMRL with soft selection uses heterogeneous knowledge to achieve good performance more efficiently than baselines. The hard selection mechanism strongly limits its capabilities and results in significantly nosier behaviors.
  • Figure 3: \ref{['fig:safety']}) Percentage of unsafe actions during training on LavaCrossing S9N1. Confidence intervals are $\pm$ 2 standard deviations across 10 random seeds. AMRL performs the smallest amount of unsafe actions closely followed by KIAN. \ref{['fig:selector:weights']}) AMRL selector weights during one episode evaluation on DoorKey 8x8 showing mean $\pm 2$ standard deviation calculated across $10$ training seeds. On average, agents pick up the key with their 6th action and unlock the door with their 11th action, indicated by the dashed lines. The second milestone causes a shift in the module weights away from the unlock skill and towards the dynamic module.
  • Figure 4: The reward dynamics during training on MiniGrid-LavaCrossing-S9N1 (training for $1.5 \times 10^6$ frames). The set of original modules is modified by adding 1, 3, and 5 additional modules outputting uniformly random actions. AMRL demonstrates consistent robustness across varying numbers of random modules, while AMRL$_{hard}$, KIAN, and KoGuN exhibit higher sensitivity, with performance and sample efficiency degrading significantly as the number of random modules increases.
  • Figure 5: Average training run performance on DoorKey 8x8 with different levels of knowledge.
  • ...and 6 more figures