Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces

Fabian Akkerman; Julius Luy; Wouter van Heeswijk; Maximilian Schiffer

Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces

Fabian Akkerman, Julius Luy, Wouter van Heeswijk, Maximilian Schiffer

TL;DR

This work proposes Dynamic Neighborhood Construction (DNC), a novel exploitation paradigm for SLDAS, and presents a scalable neighborhood exploration heuristic that utilizes this paradigm and efficiently explores the discrete neighborhood around the continuous proxy action in structured action spaces with up to $10^{73}$ actions.

Abstract

Large discrete action spaces (LDAS) remain a central challenge in reinforcement learning. Existing solution approaches can handle unstructured LDAS with up to a few million actions. However, many real-world applications in logistics, production, and transportation systems have combinatorial action spaces, whose size grows well beyond millions of actions, even on small instances. Fortunately, such action spaces exhibit structure, e.g., equally spaced discrete resource units. With this work, we focus on handling structured LDAS (SLDAS) with sizes that cannot be handled by current benchmarks: we propose Dynamic Neighborhood Construction (DNC), a novel exploitation paradigm for SLDAS. We present a scalable neighborhood exploration heuristic that utilizes this paradigm and efficiently explores the discrete neighborhood around the continuous proxy action in structured action spaces with up to $10^{73}$ actions. We demonstrate the performance of our method by benchmarking it against three state-of-the-art approaches designed for large discrete action spaces across two distinct environments. Our results show that DNC matches or outperforms state-of-the-art approaches while being computationally more efficient. Furthermore, our method scales to action spaces that so far remained computationally intractable for existing methodologies.

Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces

TL;DR

actions.

Abstract

actions. We demonstrate the performance of our method by benchmarking it against three state-of-the-art approaches designed for large discrete action spaces across two distinct environments. Our results show that DNC matches or outperforms state-of-the-art approaches while being computationally more efficient. Furthermore, our method scales to action spaces that so far remained computationally intractable for existing methodologies.

Paper Structure (37 sections, 6 theorems, 15 equations, 12 figures, 3 tables, 3 algorithms)

This paper contains 37 sections, 6 theorems, 15 equations, 12 figures, 3 tables, 3 algorithms.

Introduction
Related Literature
Contribution
Problem Description
Methodology
Generating a Discrete Base Action
Generating Sets of Discrete Neighbors
Evaluating Discrete Action Neighborhoods
Discussion
Experimental Design
Numerical Results
Conclusion
Proofs of Lemmata 1, 2, and 3
Proof
Proof
...and 22 more sections

Key Result

Lemma 3.1

Action similarity $L$ is given by $\sup\limits_{{\boldsymbol{a},\boldsymbol{a}^\prime \in \mathcal{A}^\prime, \boldsymbol{a} \neq \boldsymbol{a}^\prime}}\frac{|Q^{\pi}(\boldsymbol{s},\boldsymbol{a})- Q^{\pi}(\boldsymbol{s},\boldsymbol{a}^\prime)|}{\lVert \boldsymbol{a}- \boldsymbol{a}^\prime\rVert_2

Figures (12)

Figure 1: Structured and unstructured action spaces in 2D. Each vertex represents an action.
Figure 2: Pipeline for finding discrete actions in SLDAS
Figure 3: Average total expected returns during testing for 50 random seeds over the training iterations. The shaded area represents the 2 standard deviation training seed variance corridor. (Top) maze environment results. (Bottom) real-world environment results.
Figure 4: Illustration of a locally convex neighborhood of $J$ with respect to the perturbed actions $\boldsymbol{a}'$. DNC selects actions based on $Q(\boldsymbol{a}')$ and thus may return any $\boldsymbol{a}'\in\mathcal{A}'$. The convex property guarantees that $J(\boldsymbol{a}")\leq J(\boldsymbol{a}'),\forall \boldsymbol{a}'\in\mathcal{A}'$ for some maximally perturbed action $\boldsymbol{a}"$.
Figure 5: Average total expected returns during testing for 50 random seeds over the training iterations in the maze environment, including the performance of and a hybrid approach. The shaded area represents the 2 standard deviation training seed variance corridor.
...and 7 more figures

Theorems & Definitions (7)

Definition 1
Lemma 3.1
Lemma 3.2
Lemma 3.3
Lemma A.1
Lemma A.2
Lemma A.3

Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces

TL;DR

Abstract

Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (7)