Multicopy Reinforcement Learning Agents
Alicia P. Wolfe, Oliver Diamond, Brigitte Goeler-Slough, Remi Feuerman, Magdalena Kisielinska, Victoria Manfredi
TL;DR
This work introduces multicopy reinforcement learning, a framework where an agent can create multiple copies to tackle noisy tasks, balancing per-copy costs with an optimization reward drawn from the best copy. The method decomposes the return into a cost component $G_c$ summed over all copies and an optimization component $G_o$ contributed by the best copy, enabling tractable learning with separate $Q_c$ and $Q_o$ value functions. It shows that joint-state consideration is unnecessary except at the duplication decision and demonstrates through Three Bridges gridworld experiments that multicopy policies can adapt the number and placement of copies, including cases with identical duplicates and correlated risks. The approach achieves strong performance gains over a baseline joint-action RL method, particularly under high noise or risk, and points to future work with neural approximators, richer communication, and distributional or multiobjective extensions for broader real-world applications.
Abstract
This paper examines a novel type of multi-agent problem, in which an agent makes multiple identical copies of itself in order to achieve a single agent task better or more efficiently. This strategy improves performance if the environment is noisy and the task is sometimes unachievable by a single agent copy. We propose a learning algorithm for this multicopy problem which takes advantage of the structure of the value function to efficiently learn how to balance the advantages and costs of adding additional copies.
