Highway Graph to Accelerate Reinforcement Learning
Zidu Yin, Zhen Zhang, Dong Gong, Stefano V. Albrecht, Javen Q. Shi
TL;DR
The paper introduces the highway graph to accelerate reinforcement learning by compressing the empirical state-transition graph into a smaller highway graph, enabling multi-step value propagation and faster convergence of value updates. It defines the highway MDP and a graph Bellman operator to perform complete value updates on the reduced graph, with convergence guarantees. Empirical results across four task categories show speedups of roughly 10x to over 150x while preserving or improving returns, and a neural network re-parameterization (HG-Q) provides storage-efficient, generalizable policy initialization. The approach yields significant training-time gains, strong sample efficiency, and improved generalization, with a clear path to extending the method to more complex and stochastic environments. The work offers a practical, scalable mechanism to enhance VI-based RL and lays groundwork for broader integration into RL algorithms.
Abstract
Reinforcement Learning (RL) algorithms often struggle with low training efficiency. A common approach to address this challenge is integrating model-based planning algorithms, such as Monte Carlo Tree Search (MCTS) or Value Iteration (VI), into the environmental model. However, VI requires iterating over a large tensor which updates the value of the preceding state based on the succeeding state through value propagation, resulting in computationally intensive operations. To enhance the RL training efficiency, we propose improving the efficiency of the value learning process. In deterministic environments with discrete state and action spaces, we observe that on the sampled empirical state-transition graph, a non-branching sequence of transitions-termed a highway-can take the agent to another state without deviation through intermediate states. On these non-branching highways, the value-updating process can be streamlined into a single-step operation, eliminating the need for step-by-step updates. Building on this observation, we introduce the highway graph to model state transitions. The highway graph compresses the transition model into a compact representation, where edges can encapsulate multiple state transitions, enabling value propagation across multiple time steps in a single iteration. By integrating the highway graph into RL, the training process is significantly accelerated, particularly in the early stages of training. Experiments across four categories of environments demonstrate that our method learns significantly faster than established and state-of-the-art RL algorithms (often by a factor of 10 to 150) while maintaining equal or superior expected returns. Furthermore, a deep neural network-based agent trained using the highway graph exhibits improved generalization capabilities and reduced storage costs. Code is publicly available at https://github.com/coodest/highwayRL.
