Table of Contents
Fetching ...

PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning Projects

Hannah Markgraf, Michael Eichelbeck, Daria Cappey, Selin Demirtürk, Yara Schattschneider, Matthias Althoff

TL;DR

The paper addresses the lack of standardized infrastructure for managing offline RL datasets tied to gymnasium benchmarks by introducing PyTupli, a client-server framework with a Python API, JSON-serialized benchmarks, and artifact references. It enables scalable collaboration through fine-grained episode- and tuple-level filtering, RBAC-based access control, and a containerized deployment that supports secure uploading, downloading, and sharing of benchmark problems and their associated data. Key contributions include benchmark and artifact management, structured RL-tuple storage and querying, environment wrappers with serialization, and an end-to-end deployment that integrates with existing offline RL ecosystems such as d3rlpy. PyTupli thus facilitates reproducible, collaborative offline RL research and industry-academia workflows by lowering infrastructure overhead for dataset creation, curation, and sharing.

Abstract

Offline reinforcement learning (RL) has gained traction as a powerful paradigm for learning control policies from pre-collected data, eliminating the need for costly or risky online interactions. While many open-source libraries offer robust implementations of offline RL algorithms, they all rely on datasets composed of experience tuples consisting of state, action, next state, and reward. Managing, curating, and distributing such datasets requires suitable infrastructure. Although static datasets exist for established benchmark problems, no standardized or scalable solution supports developing and sharing datasets for novel or user-defined benchmarks. To address this gap, we introduce PyTupli, a Python-based tool to streamline the creation, storage, and dissemination of benchmark environments and their corresponding tuple datasets. PyTupli includes a lightweight client library with defined interfaces for uploading and retrieving benchmarks and data. It supports fine-grained filtering at both the episode and tuple level, allowing researchers to curate high-quality, task-specific datasets. A containerized server component enables production-ready deployment with authentication, access control, and automated certificate provisioning for secure use. By addressing key barriers in dataset infrastructure, PyTupli facilitates more collaborative, reproducible, and scalable offline RL research.

PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning Projects

TL;DR

The paper addresses the lack of standardized infrastructure for managing offline RL datasets tied to gymnasium benchmarks by introducing PyTupli, a client-server framework with a Python API, JSON-serialized benchmarks, and artifact references. It enables scalable collaboration through fine-grained episode- and tuple-level filtering, RBAC-based access control, and a containerized deployment that supports secure uploading, downloading, and sharing of benchmark problems and their associated data. Key contributions include benchmark and artifact management, structured RL-tuple storage and querying, environment wrappers with serialization, and an end-to-end deployment that integrates with existing offline RL ecosystems such as d3rlpy. PyTupli thus facilitates reproducible, collaborative offline RL research and industry-academia workflows by lowering infrastructure overhead for dataset creation, curation, and sharing.

Abstract

Offline reinforcement learning (RL) has gained traction as a powerful paradigm for learning control policies from pre-collected data, eliminating the need for costly or risky online interactions. While many open-source libraries offer robust implementations of offline RL algorithms, they all rely on datasets composed of experience tuples consisting of state, action, next state, and reward. Managing, curating, and distributing such datasets requires suitable infrastructure. Although static datasets exist for established benchmark problems, no standardized or scalable solution supports developing and sharing datasets for novel or user-defined benchmarks. To address this gap, we introduce PyTupli, a Python-based tool to streamline the creation, storage, and dissemination of benchmark environments and their corresponding tuple datasets. PyTupli includes a lightweight client library with defined interfaces for uploading and retrieving benchmarks and data. It supports fine-grained filtering at both the episode and tuple level, allowing researchers to curate high-quality, task-specific datasets. A containerized server component enables production-ready deployment with authentication, access control, and automated certificate provisioning for secure use. By addressing key barriers in dataset infrastructure, PyTupli facilitates more collaborative, reproducible, and scalable offline RL research.

Paper Structure

This paper contains 20 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of the core functionalities of PyTupli.
  • Figure 2: Simplified UML class diagram of the client-side architecture. Some relations are omitted for clarity but can be derived from the given types.
  • Figure 3: UML component diagram of the production deployment.
  • Figure 4: Usage example: Workflow for Company B.
  • Figure 5: Usage example: Workflow University A.