PufferLib: Making Reinforcement Learning Libraries and Environments Play Nice

Joseph Suarez

PufferLib: Making Reinforcement Learning Libraries and Environments Play Nice

Joseph Suarez

TL;DR

PufferLib tackles the incompatibility between rich RL environments and standard learning libraries by introducing an emulation layer that makes diverse environments appear Atari-like, enabling seamless use with existing tools. It pairs one-line environment wrappers with fast vectorization, a Docker-based development stack (PufferTank), and bindings for numerous environments, achieving significant speedups and enabling complex multiagent settings. Key contributions include the emulation stack, broad environment bindings, optimized vectorization pathways, and first-party environments like Puffer Ocean, along with demonstrated impact on Neural MMO and Pokemon Red projects. The work lowers setup friction, improves performance, and broadens the practical applicability of RL research to more complex, realistic tasks, promoting faster experimentation and reproducibility.

Abstract

You have an environment, a model, and a reinforcement learning library that are designed to work together but don't. PufferLib makes them play nice. The library provides one-line environment wrappers that eliminate common compatibility problems and fast vectorization to accelerate training. With PufferLib, you can use familiar libraries like CleanRL and SB3 to scale from classic benchmarks like Atari and Procgen to complex simulators like NetHack and Neural MMO. We release pip packages and prebuilt images with dependencies for dozens of environments. All of our code is free and open-source software under the MIT license, complete with baselines, documentation, and support at pufferai.github.io.

PufferLib: Making Reinforcement Learning Libraries and Environments Play Nice

TL;DR

Abstract

Paper Structure (13 sections, 1 figure, 2 tables)

This paper contains 13 sections, 1 figure, 2 tables.

Background and Introduction
PufferTank
PufferLib
Emulation
Environments
Vectorization
Models
First-party Environments: Puffer Ocean
Performance
First-party Training with Clean PuffeRL
Proof of Impact
Limitations
Conclusion

Figures (1)

Figure 1: The PufferLib system architecture for broad environment compatibility and fast vectorization. Each core simulates one or several environments. Vectorization aggregates data from several processes and distributes actions across them. Each environment is wrapped in PufferLib's core emulation layer, which ensures flat data representations.

PufferLib: Making Reinforcement Learning Libraries and Environments Play Nice

TL;DR

Abstract

PufferLib: Making Reinforcement Learning Libraries and Environments Play Nice

Authors

TL;DR

Abstract

Table of Contents

Figures (1)