Table of Contents
Fetching ...

Model Evolution Framework with Genetic Algorithm for Multi-Task Reinforcement Learning

Yan Yu, Wengang Zhou, Yaodong Yang, Wanxuan Lu, Yingyan Hou, Houqiang Li

TL;DR

MEGA addresses the challenge of learning a single policy for multiple tasks with varying difficulty by evolving model structure through a genotype module-level model. It uses binary genotype policies of variable length to allocate and weight modules, with a stage-driven mechanism that grows the model as tasks become harder. A non-gradient genetic algorithm optimizes task-specific genotype populations, and the HalfSoftmax transform enables flexible, sparse module utilization. Experiments on the Meta-World benchmark show state-of-the-art performance and improved module efficiency, validating the framework's dynamic, task-aware approach and its potential for scalable, adaptable multi-task reinforcement learning.

Abstract

Multi-task reinforcement learning employs a single policy to complete various tasks, aiming to develop an agent with generalizability across different scenarios. Given the shared characteristics of tasks, the agent's learning efficiency can be enhanced through parameter sharing. Existing approaches typically use a routing network to generate specific routes for each task and reconstruct a set of modules into diverse models to complete multiple tasks simultaneously. However, due to the inherent difference between tasks, it is crucial to allocate resources based on task difficulty, which is constrained by the model's structure. To this end, we propose a Model Evolution framework with Genetic Algorithm (MEGA), which enables the model to evolve during training according to the difficulty of the tasks. When the current model is insufficient for certain tasks, the framework will automatically incorporate additional modules, enhancing the model's capabilities. Moreover, to adapt to our model evolution framework, we introduce a genotype module-level model, using binary sequences as genotype policies for model reconstruction, while leveraging a non-gradient genetic algorithm to optimize these genotype policies. Unlike routing networks with fixed output dimensions, our approach allows for the dynamic adjustment of the genotype policy length, enabling it to accommodate models with a varying number of modules. We conducted experiments on various robotics manipulation tasks in the Meta-World benchmark. Our state-of-the-art performance demonstrated the effectiveness of the MEGA framework. We will release our source code to the public.

Model Evolution Framework with Genetic Algorithm for Multi-Task Reinforcement Learning

TL;DR

MEGA addresses the challenge of learning a single policy for multiple tasks with varying difficulty by evolving model structure through a genotype module-level model. It uses binary genotype policies of variable length to allocate and weight modules, with a stage-driven mechanism that grows the model as tasks become harder. A non-gradient genetic algorithm optimizes task-specific genotype populations, and the HalfSoftmax transform enables flexible, sparse module utilization. Experiments on the Meta-World benchmark show state-of-the-art performance and improved module efficiency, validating the framework's dynamic, task-aware approach and its potential for scalable, adaptable multi-task reinforcement learning.

Abstract

Multi-task reinforcement learning employs a single policy to complete various tasks, aiming to develop an agent with generalizability across different scenarios. Given the shared characteristics of tasks, the agent's learning efficiency can be enhanced through parameter sharing. Existing approaches typically use a routing network to generate specific routes for each task and reconstruct a set of modules into diverse models to complete multiple tasks simultaneously. However, due to the inherent difference between tasks, it is crucial to allocate resources based on task difficulty, which is constrained by the model's structure. To this end, we propose a Model Evolution framework with Genetic Algorithm (MEGA), which enables the model to evolve during training according to the difficulty of the tasks. When the current model is insufficient for certain tasks, the framework will automatically incorporate additional modules, enhancing the model's capabilities. Moreover, to adapt to our model evolution framework, we introduce a genotype module-level model, using binary sequences as genotype policies for model reconstruction, while leveraging a non-gradient genetic algorithm to optimize these genotype policies. Unlike routing networks with fixed output dimensions, our approach allows for the dynamic adjustment of the genotype policy length, enabling it to accommodate models with a varying number of modules. We conducted experiments on various robotics manipulation tasks in the Meta-World benchmark. Our state-of-the-art performance demonstrated the effectiveness of the MEGA framework. We will release our source code to the public.

Paper Structure

This paper contains 26 sections, 7 equations, 12 figures, 3 tables, 5 algorithms.

Figures (12)

  • Figure 1: For tasks with varying difficulty, our MEGA uses binary genotype policies with different lengths to allocate varying quantities of modules. The 'stage' denotes the number of modules allocated to the task. A segment of the genotype policy with a length of $4$ is used to generate a single module weight. Different segments of the genotype policy, represented by distinct colors, generate module weights at different levels. For a task at stage $N$, the final segment of the genotype policy generates $N+1$ weights, which are used to weight both the initial input and the outputs of $N$ modules.
  • Figure 2: Models with different structures. (a) The model handles each task with a specific output head. (b) The routing network generates module weights for model reconstruction. (c) The routing network selects specific module combinations for different tasks.
  • Figure 3: The structure of the genotype module-level model involves selecting a genotype policy from the multi-task community based on the task ID, decomposing the genotypes into weights, and reconstructing the module-level model. To enhance the model's capabilities, the model continuously evolves by incorporating additional modules.
  • Figure 4: The genetic algorithm optimizes the genotype policy population as follows. (a) For each task, a genotype policy is selected from the task population to perform the task, and its reward serves as fitness. During optimization, crossover and mutation operations are applied within the task population, while less fit policies are discarded. (b) The crossover operator exchanges parts of two genotype policies to create new ones, while the mutation operator modifies a single genotype policy by flipping some of its genes.
  • Figure 5: The mechanism of the model evolution framework. When tasks cannot be completed by the current model, it dynamically adds modules and evolves to adapt to the more difficult tasks.
  • ...and 7 more figures