Table of Contents
Fetching ...

Multicopy Reinforcement Learning Agents

Alicia P. Wolfe, Oliver Diamond, Brigitte Goeler-Slough, Remi Feuerman, Magdalena Kisielinska, Victoria Manfredi

TL;DR

This work introduces multicopy reinforcement learning, a framework where an agent can create multiple copies to tackle noisy tasks, balancing per-copy costs with an optimization reward drawn from the best copy. The method decomposes the return into a cost component $G_c$ summed over all copies and an optimization component $G_o$ contributed by the best copy, enabling tractable learning with separate $Q_c$ and $Q_o$ value functions. It shows that joint-state consideration is unnecessary except at the duplication decision and demonstrates through Three Bridges gridworld experiments that multicopy policies can adapt the number and placement of copies, including cases with identical duplicates and correlated risks. The approach achieves strong performance gains over a baseline joint-action RL method, particularly under high noise or risk, and points to future work with neural approximators, richer communication, and distributional or multiobjective extensions for broader real-world applications.

Abstract

This paper examines a novel type of multi-agent problem, in which an agent makes multiple identical copies of itself in order to achieve a single agent task better or more efficiently. This strategy improves performance if the environment is noisy and the task is sometimes unachievable by a single agent copy. We propose a learning algorithm for this multicopy problem which takes advantage of the structure of the value function to efficiently learn how to balance the advantages and costs of adding additional copies.

Multicopy Reinforcement Learning Agents

TL;DR

This work introduces multicopy reinforcement learning, a framework where an agent can create multiple copies to tackle noisy tasks, balancing per-copy costs with an optimization reward drawn from the best copy. The method decomposes the return into a cost component summed over all copies and an optimization component contributed by the best copy, enabling tractable learning with separate and value functions. It shows that joint-state consideration is unnecessary except at the duplication decision and demonstrates through Three Bridges gridworld experiments that multicopy policies can adapt the number and placement of copies, including cases with identical duplicates and correlated risks. The approach achieves strong performance gains over a baseline joint-action RL method, particularly under high noise or risk, and points to future work with neural approximators, richer communication, and distributional or multiobjective extensions for broader real-world applications.

Abstract

This paper examines a novel type of multi-agent problem, in which an agent makes multiple identical copies of itself in order to achieve a single agent task better or more efficiently. This strategy improves performance if the environment is noisy and the task is sometimes unachievable by a single agent copy. We propose a learning algorithm for this multicopy problem which takes advantage of the structure of the value function to efficiently learn how to balance the advantages and costs of adding additional copies.
Paper Structure (15 sections, 16 equations, 6 figures, 3 tables)

This paper contains 15 sections, 16 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Shadowed Equilibrium Example.
  • Figure 2: Three-bridge gridworlds. Terminal states are marked with double squares. All other states have 4 actions: North, South, East, West. Postive terminal state rewards are optimization rewards, negative are costs. White squares have a per-step cost that is set for each experiment.
  • Figure 3: Three Bridges: Learning results for basic non-multicopy vs multicopy algorithms on the 3 bridges gridworld. Cost per step is -2 in all plots. Shaded lines are averaged over 50 trial runs, dark lines are rolling 30-episode averages of those values.
  • Figure 4: Varying Noise and Cost. Return and policy during testing for various cost and noise settings, showing the improvement in return from the Joint Action algorithm to Multicopy (a). Actions shown in (b) are best actions. 50 trial runs.
  • Figure 5: Identical Duplicate Actions. Return and policy during testing for various cost and noise settings when identical duplicate actions are allowed. 50 trial runs.
  • ...and 1 more figures