NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes
Nathaniel S. Keplinger, Baiting Luo, Iliyas Bektas, Yunuo Zhang, Kyle Hollins Wray, Aron Laszka, Abhishek Dubey, Ayan Mukhopadhyay
TL;DR
NS-Gym introduces an open-source, Gymnasium-compatible toolkit for non-stationary MDPs (NS-MDPs) that decouples non-stationary environmental dynamics from the agent via schedulers, update functions, and notification mechanisms. It provides custom observation spaces and planning-model support to facilitate model-based planning under changing dynamics, and it offers standardized problem instances and interfaces for reproducible benchmarking. The paper surveys NS-MDP characteristics, presents canonical problem types with varying observability and change frequency, and demonstrates benchmark results across six algorithms on several NS-MDP bases. These contributions enable robust evaluation and comparison of adaptive decision-making methods, supporting a public leaderboard and ongoing development in non-stationary reinforcement learning and planning.
Abstract
In many real-world applications, agents must make sequential decisions in environments where conditions are subject to change due to various exogenous factors. These non-stationary environments pose significant challenges to traditional decision-making models, which typically assume stationary dynamics. Non-stationary Markov decision processes (NS-MDPs) offer a framework to model and solve decision problems under such changing conditions. However, the lack of standardized benchmarks and simulation tools has hindered systematic evaluation and advance in this field. We present NS-Gym, the first simulation toolkit designed explicitly for NS-MDPs, integrated within the popular Gymnasium framework. In NS-Gym, we segregate the evolution of the environmental parameters that characterize non-stationarity from the agent's decision-making module, allowing for modular and flexible adaptations to dynamic environments. We review prior work in this domain and present a toolkit encapsulating key problem characteristics and types in NS-MDPs. This toolkit is the first effort to develop a set of standardized interfaces and benchmark problems to enable consistent and reproducible evaluation of algorithms under non-stationary conditions. We also benchmark six algorithmic approaches from prior work on NS-MDPs using NS-Gym. Our vision is that NS-Gym will enable researchers to assess the adaptability and robustness of their decision-making algorithms to non-stationary conditions.
