Learning Where, What and How to Transfer: A Multi-Role Reinforcement Learning Approach for Evolutionary Multitasking
Jiajun Zhan, Zeyuan Ma, Yue-Jiao Gong, Kay Chen Tan
TL;DR
This work addresses EMT by learning a holistic, generalizable policy that decides where, what, and how to transfer knowledge across tasks. It introduces MetaMTO, a multi-role RL system with a Task Routing agent, a Knowledge Control agent, and a Transfer Strategy Adaption group, trained end-to-end on an augmented multitask distribution (AWCCI) using PPO. Empirical results show state-of-the-art performance against both human-crafted and learning-assisted baselines, with ablations and interpretability analyses revealing that intelligent routing and adaptive transfer strategies drive the gains. The approach offers a scalable, data-driven framework for automated EMT, with practical implications for robust multitask optimization in dynamic, multi-task environments.
Abstract
Evolutionary multitasking (EMT) algorithms typically require tailored designs for knowledge transfer, in order to assure convergence and optimality in multitask optimization. In this paper, we explore designing a systematic and generalizable knowledge transfer policy through Reinforcement Learning. We first identify three major challenges: determining the task to transfer (where), the knowledge to be transferred (what) and the mechanism for the transfer (how). To address these challenges, we formulate a multi-role RL system where three (groups of) policy networks act as specialized agents: a task routing agent incorporates an attention-based similarity recognition module to determine source-target transfer pairs via attention scores; a knowledge control agent determines the proportion of elite solutions to transfer; and a group of strategy adaptation agents control transfer strength by dynamically controlling hyper-parameters in the underlying EMT framework. Through pre-training all network modules end-to-end over an augmented multitask problem distribution, a generalizable meta-policy is obtained. Comprehensive validation experiments show state-of-the-art performance of our method against representative baselines. Further in-depth analysis not only reveals the rationale behind our proposal but also provide insightful interpretations on what the system have learned.
