Table of Contents
Fetching ...

Reinforcement Learning with Curriculum-inspired Adaptive Direct Policy Guidance for Truck Dispatching

Shi Meng, Bin Tian, Xiaotong Zhang

TL;DR

This work addresses the challenge of dispatching trucks in open-pit mines using reinforcement learning under nonuniform decision intervals and sparse rewards. It introduces Curriculum-inspired Adaptive Direct Policy Guidance, a policy-based learning framework built on PPO that incorporates $\Delta t$-aware TD/GAE, a Shortest Processing Time teacher policy, and an adaptive guidance coefficient to steer exploration. Experiments in OpenMines show about a 10% production gain and faster convergence than standard PPO, across both sparse and dense reward settings, demonstrating robustness to reward design. The method generalizes beyond PPO to other RL algorithms and points to future work on more sophisticated architectures, including language-guided RL for instruction-following.

Abstract

Efficient truck dispatching via Reinforcement Learning (RL) in open-pit mining is often hindered by reliance on complex reward engineering and value-based methods. This paper introduces Curriculum-inspired Adaptive Direct Policy Guidance, a novel curriculum learning strategy for policy-based RL to address these issues. We adapt Proximal Policy Optimization (PPO) for mine dispatching's uneven decision intervals using time deltas in Temporal Difference and Generalized Advantage Estimation, and employ a Shortest Processing Time teacher policy for guided exploration via policy regularization and adaptive guidance. Evaluations in OpenMines demonstrate our approach yields a 10% performance gain and faster convergence over standard PPO across sparse and dense reward settings, showcasing improved robustness to reward design. This direct policy guidance method provides a general and effective curriculum learning technique for RL-based truck dispatching, enabling future work on advanced architectures.

Reinforcement Learning with Curriculum-inspired Adaptive Direct Policy Guidance for Truck Dispatching

TL;DR

This work addresses the challenge of dispatching trucks in open-pit mines using reinforcement learning under nonuniform decision intervals and sparse rewards. It introduces Curriculum-inspired Adaptive Direct Policy Guidance, a policy-based learning framework built on PPO that incorporates -aware TD/GAE, a Shortest Processing Time teacher policy, and an adaptive guidance coefficient to steer exploration. Experiments in OpenMines show about a 10% production gain and faster convergence than standard PPO, across both sparse and dense reward settings, demonstrating robustness to reward design. The method generalizes beyond PPO to other RL algorithms and points to future work on more sophisticated architectures, including language-guided RL for instruction-following.

Abstract

Efficient truck dispatching via Reinforcement Learning (RL) in open-pit mining is often hindered by reliance on complex reward engineering and value-based methods. This paper introduces Curriculum-inspired Adaptive Direct Policy Guidance, a novel curriculum learning strategy for policy-based RL to address these issues. We adapt Proximal Policy Optimization (PPO) for mine dispatching's uneven decision intervals using time deltas in Temporal Difference and Generalized Advantage Estimation, and employ a Shortest Processing Time teacher policy for guided exploration via policy regularization and adaptive guidance. Evaluations in OpenMines demonstrate our approach yields a 10% performance gain and faster convergence over standard PPO across sparse and dense reward settings, showcasing improved robustness to reward design. This direct policy guidance method provides a general and effective curriculum learning technique for RL-based truck dispatching, enabling future work on advanced architectures.

Paper Structure

This paper contains 16 sections, 5 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Fleet Ablation Comparison
  • Figure 2: Production Performance During Training