NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

Nathaniel S. Keplinger; Baiting Luo; Iliyas Bektas; Yunuo Zhang; Kyle Hollins Wray; Aron Laszka; Abhishek Dubey; Ayan Mukhopadhyay

NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

Nathaniel S. Keplinger, Baiting Luo, Iliyas Bektas, Yunuo Zhang, Kyle Hollins Wray, Aron Laszka, Abhishek Dubey, Ayan Mukhopadhyay

TL;DR

NS-Gym introduces an open-source, Gymnasium-compatible toolkit for non-stationary MDPs (NS-MDPs) that decouples non-stationary environmental dynamics from the agent via schedulers, update functions, and notification mechanisms. It provides custom observation spaces and planning-model support to facilitate model-based planning under changing dynamics, and it offers standardized problem instances and interfaces for reproducible benchmarking. The paper surveys NS-MDP characteristics, presents canonical problem types with varying observability and change frequency, and demonstrates benchmark results across six algorithms on several NS-MDP bases. These contributions enable robust evaluation and comparison of adaptive decision-making methods, supporting a public leaderboard and ongoing development in non-stationary reinforcement learning and planning.

Abstract

In many real-world applications, agents must make sequential decisions in environments where conditions are subject to change due to various exogenous factors. These non-stationary environments pose significant challenges to traditional decision-making models, which typically assume stationary dynamics. Non-stationary Markov decision processes (NS-MDPs) offer a framework to model and solve decision problems under such changing conditions. However, the lack of standardized benchmarks and simulation tools has hindered systematic evaluation and advance in this field. We present NS-Gym, the first simulation toolkit designed explicitly for NS-MDPs, integrated within the popular Gymnasium framework. In NS-Gym, we segregate the evolution of the environmental parameters that characterize non-stationarity from the agent's decision-making module, allowing for modular and flexible adaptations to dynamic environments. We review prior work in this domain and present a toolkit encapsulating key problem characteristics and types in NS-MDPs. This toolkit is the first effort to develop a set of standardized interfaces and benchmark problems to enable consistent and reproducible evaluation of algorithms under non-stationary conditions. We also benchmark six algorithmic approaches from prior work on NS-MDPs using NS-Gym. Our vision is that NS-Gym will enable researchers to assess the adaptability and robustness of their decision-making algorithms to non-stationary conditions.

NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

TL;DR

Abstract

Paper Structure (29 sections, 17 figures, 11 tables)

This paper contains 29 sections, 17 figures, 11 tables.

Introduction
Characteristics of NS-MDPs and Prior Work
Framework Description
Background
Overview
Problem Types and Notifications
Custom Observation for NS-MDPs
Schedulers and Parameter Update Functions
Experimental Pipeline
Non-Stationary Environment Details
Benchmark Experiments
Baseline Algorithms
Results
Conclusion
Description of NS-Gym Environments
...and 14 more sections

Figures (17)

Figure 1: An overall framework for non-stationary Markov decision processes. At time $t$, the agent observes the state $s_t \in S$ and takes an action $a \in A$. The environment emits a reward signal $r(s_t, a)$ and transitions to the next state $s_{t+1}$. The transition and the reward are governed by parameters $\theta$, which do not necessarily have a stationary distribution. In general, the evolution of $\theta$ occurs through a semi-Markov chain whose textitsojourn time is distributed as $S$, which might be non-memoryless. Depending on the problem, the agent can detect and/or observe the evolution of $\theta$.
Figure 2: A sequence diagram of the agent-environment interaction in NS-Gym. Steps 4--9 in the diagram show how parameters are updated. Step 6 checks the current MDP time step and notifies if the parameter should be updated. Step 9 returns Observation and Reward types outlined in Table \ref{['tab:custom_observation']}.
Figure 3: The Gymnasium CartPole environment.
Figure 4: The Gymnasium MountainCar environment.
Figure 5: The Gymnasium Acrobot environment.
...and 12 more figures

NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

TL;DR

Abstract

NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

Authors

TL;DR

Abstract

Table of Contents

Figures (17)