Table of Contents
Fetching ...

Mining--Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling

Chayan Banerjee, Kien Nguyen, Clinton Fookes

TL;DR

Mining-Gym tackles the lack of standardized benchmarks for reinforcement learning in mining truck dispatch by integrating a high-fidelity discrete-event simulator (DES) with an OpenAI Gym-compatible RL interface. The framework models LHDRQ cycles, resource competition, and disruptions within an event-driven MDP, enablingRL policies (PPO) to be trained and evaluated alongside classical baselines. Empirical results across six stress-test scenarios show RL policies improve productivity by about $5.7\%$ and cut mean queue lengths by roughly $24.4\%$, with the largest gains under combined resource constraints, demonstrating robust, adaptive decision-making. The work advances reproducible, scalable RL benchmarking in mining, providing a foundation for multi-objective, uncertainty-aware evaluation and potential digital-twin integration for industrial deployment.

Abstract

Optimizing the mining process -- particularly truck dispatch scheduling -- is a key driver of efficiency in open-pit operations. However, the dynamic and stochastic nature of these environments, with uncertainties such as equipment failures, truck maintenance, and variable haul cycle times, challenges traditional optimization. While Reinforcement Learning (RL) shows strong potential for adaptive decision-making in mining logistics, practical deployment requires evaluation in realistic, customizable simulation environments. The lack of standardized benchmarking hampers fair algorithm comparison, reproducibility, and real-world applicability of RL solutions. To address this, we present Mining-Gym -- a configurable, open-source benchmarking environment for training, testing, and evaluating RL algorithms in mining process optimization. Built on Salabim-based Discrete Event Simulation (DES) and integrated with Gymnasium, Mining-Gym captures mining-specific uncertainties through an event-driven decision-point architecture. It offers a GUI for parameter configuration, data logging, and real-time visualization, supporting reproducible evaluation of RL strategies and heuristic baselines. We validate Mining-Gym by comparing classical heuristics with RL-based scheduling across six scenarios from normal operation to severe equipment failures. Results show it is an effective, reproducible testbed, enabling fair evaluation of adaptive decision-making and demonstrating the strong performance potential of RL-trained schedulers.

Mining--Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling

TL;DR

Mining-Gym tackles the lack of standardized benchmarks for reinforcement learning in mining truck dispatch by integrating a high-fidelity discrete-event simulator (DES) with an OpenAI Gym-compatible RL interface. The framework models LHDRQ cycles, resource competition, and disruptions within an event-driven MDP, enablingRL policies (PPO) to be trained and evaluated alongside classical baselines. Empirical results across six stress-test scenarios show RL policies improve productivity by about and cut mean queue lengths by roughly , with the largest gains under combined resource constraints, demonstrating robust, adaptive decision-making. The work advances reproducible, scalable RL benchmarking in mining, providing a foundation for multi-objective, uncertainty-aware evaluation and potential digital-twin integration for industrial deployment.

Abstract

Optimizing the mining process -- particularly truck dispatch scheduling -- is a key driver of efficiency in open-pit operations. However, the dynamic and stochastic nature of these environments, with uncertainties such as equipment failures, truck maintenance, and variable haul cycle times, challenges traditional optimization. While Reinforcement Learning (RL) shows strong potential for adaptive decision-making in mining logistics, practical deployment requires evaluation in realistic, customizable simulation environments. The lack of standardized benchmarking hampers fair algorithm comparison, reproducibility, and real-world applicability of RL solutions. To address this, we present Mining-Gym -- a configurable, open-source benchmarking environment for training, testing, and evaluating RL algorithms in mining process optimization. Built on Salabim-based Discrete Event Simulation (DES) and integrated with Gymnasium, Mining-Gym captures mining-specific uncertainties through an event-driven decision-point architecture. It offers a GUI for parameter configuration, data logging, and real-time visualization, supporting reproducible evaluation of RL strategies and heuristic baselines. We validate Mining-Gym by comparing classical heuristics with RL-based scheduling across six scenarios from normal operation to severe equipment failures. Results show it is an effective, reproducible testbed, enabling fair evaluation of adaptive decision-making and demonstrating the strong performance potential of RL-trained schedulers.

Paper Structure

This paper contains 32 sections, 4 equations, 6 figures, 10 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) Comprehensive system architecture: Displays all key components of the Mining-Gym, including the graphical user interface (GUI), the generated configuration file used to initialize the environment, and the dashboard with real-time visualizations. (b) OpenAI Gym-compatible RL interface: Illustrates how the Mining-Gym integrates with OpenAI Gym by adapting simulation signals into the standard reset and step methods. This compatibility allows seamless integration with popular RL libraries, such as Stable-Baselines3 stable-baselines3, enabling easy training and testing of RL models.
  • Figure 2: (A) Simplified mine-site simulation logic of Mining-Gym, showing three key components: (1) Resource Handler managing resource availability and assignments, (2) Preemption Handler detecting breakdowns and managing repair processes (B) Load-Haul-Dump-Return-Query (LHDRQ) cycle illustrating the truck's journey through the mining process, which begins with querying the dispatcher for assignments, followed by loading material, hauling to the destination, dumping, and returning empty. Breakdown events, managed by the Preemption Handler, can interrupt operations at any stage. (C) DES-RL interaction flow illustrating how the RL policy integrates with the DES. At decision points, the environment state is processed by the RL policy to determine resource assignments. Immediate or step rewards guide learning during simulation, while the episodic reward at shift (or episode) end updates the policy before environment reset.
  • Figure 3: (a) Mining-Gym graphical user interface (GUI) and the generated configuration file. (b) Real-time representative visualization of the mine site. The screengrab shows trucks queued at shovels, with the rightmost shovel offline, and trucks transiting between dump sites, crushers, and shovels.
  • Figure 4: Conceptual Diagram of Minesite
  • Figure 5: Comparison of "Mean Trips per hour" and "Mean Truck Queue Length" over a 6-hour shift under six failure scenarios: (a) No failures $\{0,0\}$, (b) Moderate Truck Stress $\{0,15\}$, (c) Single Shovel Loss $\{1,0\}$, (d) Combined Stress $\{1,15\}$, (e) Severe Disruption $\{2,30\}$, and (f) Critical Bottleneck $\{3,0\}$. Shovel failures occurs between 100-150 mins and Truck failures occur between 150-200 mins across the scenarios. Scenarios progress from baseline operation to severe capacity constraints on loading and hauling.
  • ...and 1 more figures