Breaking the Grid: Distance-Guided Reinforcement Learning in Large Discrete and Hybrid Action Spaces

Heiko Hoppe; Fabian Akkerman; Wouter van Heeswijk; Maximilian Schiffer

Breaking the Grid: Distance-Guided Reinforcement Learning in Large Discrete and Hybrid Action Spaces

Heiko Hoppe, Fabian Akkerman, Wouter van Heeswijk, Maximilian Schiffer

TL;DR

DGRL is proposed, combining Sampled Dynamic Neighborhoods (SDN) and Distance-Based Updates (DBU) to enable efficient RL in spaces with up to 10 actions, and demonstrates performance improvements of up to 66% against state-of-the-art benchmarks across regularly and irregularly structured environments, while simultaneously improving convergence speed and computational complexity.

Abstract

Reinforcement Learning is increasingly applied to logistics, scheduling, and recommender systems, but standard algorithms struggle with the curse of dimensionality in such large discrete action spaces. Existing algorithms typically rely on restrictive grid-based structures or computationally expensive nearest-neighbor searches, limiting their effectiveness in high-dimensional or irregularly structured domains. We propose Distance-Guided Reinforcement Learning (DGRL), combining Sampled Dynamic Neighborhoods (SDN) and Distance-Based Updates (DBU) to enable efficient RL in spaces with up to 10$^\text{20}$ actions. Unlike prior methods, SDN leverages a semantic embedding space to perform stochastic volumetric exploration, provably providing full support over a local trust region. Complementing this, DBU transforms policy optimization into a stable regression task, decoupling gradient variance from action space cardinality and guaranteeing monotonic policy improvement. DGRL naturally generalizes to hybrid continuous-discrete action spaces without requiring hierarchical dependencies. We demonstrate performance improvements of up to 66% against state-of-the-art benchmarks across regularly and irregularly structured environments, while simultaneously improving convergence speed and computational complexity.

Breaking the Grid: Distance-Guided Reinforcement Learning in Large Discrete and Hybrid Action Spaces

TL;DR

Abstract

actions. Unlike prior methods, SDN leverages a semantic embedding space to perform stochastic volumetric exploration, provably providing full support over a local trust region. Complementing this, DBU transforms policy optimization into a stable regression task, decoupling gradient variance from action space cardinality and guaranteeing monotonic policy improvement. DGRL naturally generalizes to hybrid continuous-discrete action spaces without requiring hierarchical dependencies. We demonstrate performance improvements of up to 66% against state-of-the-art benchmarks across regularly and irregularly structured environments, while simultaneously improving convergence speed and computational complexity.

Paper Structure (46 sections, 14 theorems, 41 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 46 sections, 14 theorems, 41 equations, 8 figures, 7 tables, 2 algorithms.

Introduction
Problem Setup and Notation
Theoretical Framework: Design Principles
Methodology
Sampled Dynamic Neighborhood (SDN)
Distance-Based Updates (DBU)
Unified Treatment of Hybrid Spaces
Theoretical Properties
Numerical Study
Experimental Design.
Empirical Performance and Scalability.
Discrete Action Spaces.
Hybrid Action Spaces.
Computational Complexity.
Ablation Studies and Metric Sensitivity.
...and 31 more sections

Key Result

Proposition 3.1

Let $Q(s, \cdot): \mathcal{A}' \to \mathbb{R}$ be $L_Q$-Lipschitz continuous w.r.t. a metric $d$. Let $a^\star$ be the optimal discrete action, $\hat{a}$ be a continuous proto-action, and $\bar{a}$ be a target action. The value loss of rounding $\hat{a}$ to its nearest neighbor $a_\mathrm{nn}$ is bo

Figures (8)

Figure 1: Schematic representation of SDN.
Figure 2: Schematic representation of DBU.
Figure 3: Results for discrete environments, averaged over 10 random seeds. Titles indicate size and type (structured or irregular) of action space. Legend indicates mapping method and RL algorithm.
Figure 4: Results for hybrid environments, averaged over 10 random seeds. Titles indicate size and type (structured or irregular) of action space. Legend indicates mapping method and RL algorithm.
Figure 5: Schematic overview over full algorithm.
...and 3 more figures

Theorems & Definitions (23)

Proposition 3.1: Approximation Bound via Lipschitz Continuity
Proposition 3.2: Dimensional Invariance of Chebyshev Neighborhoods
Proposition 5.1: Volumetric vs. Axial Support
Theorem 5.2: Removal of Action-Cardinality Dependence
Proposition 5.3: Trust Region Projection
Remark 5.4: Approximate Coordinate Ascent
Proposition 1.1: Approximation Bound via Lipschitz Continuity
proof
Proposition 1.2: Dimensional Invariance of Chebyshev Neighborhoods
proof
...and 13 more

Breaking the Grid: Distance-Guided Reinforcement Learning in Large Discrete and Hybrid Action Spaces

TL;DR

Abstract

Breaking the Grid: Distance-Guided Reinforcement Learning in Large Discrete and Hybrid Action Spaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (23)