Mining--Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling
Chayan Banerjee, Kien Nguyen, Clinton Fookes
TL;DR
Mining-Gym tackles the lack of standardized benchmarks for reinforcement learning in mining truck dispatch by integrating a high-fidelity discrete-event simulator (DES) with an OpenAI Gym-compatible RL interface. The framework models LHDRQ cycles, resource competition, and disruptions within an event-driven MDP, enablingRL policies (PPO) to be trained and evaluated alongside classical baselines. Empirical results across six stress-test scenarios show RL policies improve productivity by about $5.7\%$ and cut mean queue lengths by roughly $24.4\%$, with the largest gains under combined resource constraints, demonstrating robust, adaptive decision-making. The work advances reproducible, scalable RL benchmarking in mining, providing a foundation for multi-objective, uncertainty-aware evaluation and potential digital-twin integration for industrial deployment.
Abstract
Optimizing the mining process -- particularly truck dispatch scheduling -- is a key driver of efficiency in open-pit operations. However, the dynamic and stochastic nature of these environments, with uncertainties such as equipment failures, truck maintenance, and variable haul cycle times, challenges traditional optimization. While Reinforcement Learning (RL) shows strong potential for adaptive decision-making in mining logistics, practical deployment requires evaluation in realistic, customizable simulation environments. The lack of standardized benchmarking hampers fair algorithm comparison, reproducibility, and real-world applicability of RL solutions. To address this, we present Mining-Gym -- a configurable, open-source benchmarking environment for training, testing, and evaluating RL algorithms in mining process optimization. Built on Salabim-based Discrete Event Simulation (DES) and integrated with Gymnasium, Mining-Gym captures mining-specific uncertainties through an event-driven decision-point architecture. It offers a GUI for parameter configuration, data logging, and real-time visualization, supporting reproducible evaluation of RL strategies and heuristic baselines. We validate Mining-Gym by comparing classical heuristics with RL-based scheduling across six scenarios from normal operation to severe equipment failures. Results show it is an effective, reproducible testbed, enabling fair evaluation of adaptive decision-making and demonstrating the strong performance potential of RL-trained schedulers.
