Table of Contents
Fetching ...

Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts

Ahmed Hendawy, Jan Peters, Carlo D'Eramo

TL;DR

This work introduces MOORE, a representation-learning framework for Multi-Task Reinforcement Learning that enforces diversity among shared representations by orthogonalizing a mixture of experts. By mapping states to a $k$-dimensional orthonormal subspace on the Stiefel manifold and interpolating task-specific representations with weights $w_c$, MOORE yields a universal policy applicable across tasks. Empirical results on MiniGrid and MetaWorld show MOORE achieving state-of-the-art performance, strong transfer capabilities, and interpretable representations, with ablations validating the importance of Gram-Schmidt orthogonalization. The approach offers a principled way to balance shared structure and task-specific needs in MTRL, with potential extensions to sparse expert selection and continual learning.

Abstract

Multi-Task Reinforcement Learning (MTRL) tackles the long-standing problem of endowing agents with skills that generalize across a variety of problems. To this end, sharing representations plays a fundamental role in capturing both unique and common characteristics of the tasks. Tasks may exhibit similarities in terms of skills, objects, or physical properties while leveraging their representations eases the achievement of a universal policy. Nevertheless, the pursuit of learning a shared set of diverse representations is still an open challenge. In this paper, we introduce a novel approach for representation learning in MTRL that encapsulates common structures among the tasks using orthogonal representations to promote diversity. Our method, named Mixture Of Orthogonal Experts (MOORE), leverages a Gram-Schmidt process to shape a shared subspace of representations generated by a mixture of experts. When task-specific information is provided, MOORE generates relevant representations from this shared subspace. We assess the effectiveness of our approach on two MTRL benchmarks, namely MiniGrid and MetaWorld, showing that MOORE surpasses related baselines and establishes a new state-of-the-art result on MetaWorld.

Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts

TL;DR

This work introduces MOORE, a representation-learning framework for Multi-Task Reinforcement Learning that enforces diversity among shared representations by orthogonalizing a mixture of experts. By mapping states to a -dimensional orthonormal subspace on the Stiefel manifold and interpolating task-specific representations with weights , MOORE yields a universal policy applicable across tasks. Empirical results on MiniGrid and MetaWorld show MOORE achieving state-of-the-art performance, strong transfer capabilities, and interpretable representations, with ablations validating the importance of Gram-Schmidt orthogonalization. The approach offers a principled way to balance shared structure and task-specific needs in MTRL, with potential extensions to sparse expert selection and continual learning.

Abstract

Multi-Task Reinforcement Learning (MTRL) tackles the long-standing problem of endowing agents with skills that generalize across a variety of problems. To this end, sharing representations plays a fundamental role in capturing both unique and common characteristics of the tasks. Tasks may exhibit similarities in terms of skills, objects, or physical properties while leveraging their representations eases the achievement of a universal policy. Nevertheless, the pursuit of learning a shared set of diverse representations is still an open challenge. In this paper, we introduce a novel approach for representation learning in MTRL that encapsulates common structures among the tasks using orthogonal representations to promote diversity. Our method, named Mixture Of Orthogonal Experts (MOORE), leverages a Gram-Schmidt process to shape a shared subspace of representations generated by a mixture of experts. When task-specific information is provided, MOORE generates relevant representations from this shared subspace. We assess the effectiveness of our approach on two MTRL benchmarks, namely MiniGrid and MetaWorld, showing that MOORE surpasses related baselines and establishes a new state-of-the-art result on MetaWorld.
Paper Structure (26 sections, 3 equations, 16 figures, 7 tables, 2 algorithms)

This paper contains 26 sections, 3 equations, 16 figures, 7 tables, 2 algorithms.

Figures (16)

  • Figure 1: MOORE illustrative diagram. A state $s$ is encoded as a set of representations using a mixture of experts. The Gram-Schmidt process orthogonalizes the representations to encourage diversity. Then, the output head processes the representations $V_s$ by interpolating the task-specific representations $v_{c}$ using the task-specific weights $w_{c}$, from which we compute the output using the output function $f_{\theta}$. In our approach, we employ this architecture for both the actor and the critic.
  • Figure 2: Average return on the three MTRL scenarios of MiniGrid. We utilize both multi-head and single-head architectures for our approach MOORE as well as the related baselines. For MOORE, MOE and PCGrad, the number of experts $k$ is 2, 3, and 4 for MT3, MT5, and MT7, respectively. The black dashed line represents the final single-task performance of PPO averaged across all tasks. For the evaluation metric, we compute the accumulated return averaged across all tasks. We report the mean and the 95% confidence interval across 30 different runs.
  • Figure 3: Evaluating MOORE against MOE on the transfer setting. The study is conducted on the two transfer learning scenarios in MiniGrid, employing a multi-head architecture. The number of experts $k$ is 2 and 3 for MT3 $\rightarrow$ MT5 and MT5 $\rightarrow$ MT7, respectively. For the evaluation metric, we compute the accumulated return averaged across all tasks. We report the mean and the 95% confidence interval across 30 different runs.
  • Figure 4: Ablation study on the effect of changing the number of experts. We compare the performance of MOE and MOORE (ours) on MiniGrid MT7 using a single-head architecture. We report the mean of the evaluation metric across 30 seeds. For the evaluation metric, we compute the accumulated return averaged across all tasks.
  • Figure 5: (a) Success rate on MetaWorld MT10-rand comparing MOORE, against MOE, using $4$ experts. (b) Success rate on MetaWorld MT50-rand comparing MOORE, against MOE, given $6$ experts. We show the average success rate across all tasks and the $95\%$ confidence interval across $10$ and $5$ different runs for MT10-rand and MT50-rand, respectively.
  • ...and 11 more figures

Theorems & Definitions (3)

  • Definition 4.1
  • Definition 4.2
  • Definition 4.3