Table of Contents
Fetching ...

CANDID DAC: Leveraging Coupled Action Dimensions with Importance Differences in DAC

Philipp Bordne, M. Asif Hasan, Eddie Bergman, Noor Awad, André Biedenkapp

TL;DR

This paper addresses dynamic algorithm configuration (DAC) in high-dimensional action spaces where action dimensions are coupled and differ in importance, a setting it calls CANDID. It introduces a white-box Piecewise Linear benchmark within the DACBench suite to instantiate CANDID properties via a weighted aggregation across dimensions (weights $w_m = \lambda^{m-1}$) and an exponential reward $r_t = e^{-c \cdot \text{prederror}(a_t^{1:M})}$, with targets defined by a piecewise linear function over time steps and dimension combinations. To tackle the resulting coordination challenge, the authors develop sequential policies (SDQN-inspired) that learn a policy per action dimension and condition on previously chosen actions, specifically SAQL and simSDQN, and compare them to a single-agent DDQN baseline and an independent Q-learning baseline. Experiments show that sequential policies achieve superior performance in CANDID settings, scale better with increasing action-space size, and benefit from ordering action selection by importance, suggesting a viable path for coordinating high-dimensional DAC problems in practice. The work provides publicly available code and motivates further integration with state-of-the-art MARL methods and communication strategies to further enhance scalability and coordination.

Abstract

High-dimensional action spaces remain a challenge for dynamic algorithm configuration (DAC). Interdependencies and varying importance between action dimensions are further known key characteristics of DAC problems. We argue that these Coupled Action Dimensions with Importance Differences (CANDID) represent aspects of the DAC problem that are not yet fully explored. To address this gap, we introduce a new white-box benchmark within the DACBench suite that simulates the properties of CANDID. Further, we propose sequential policies as an effective strategy for managing these properties. Such policies factorize the action space and mitigate exponential growth by learning a policy per action dimension. At the same time, these policies accommodate the interdependence of action dimensions by fostering implicit coordination. We show this in an experimental study of value-based policies on our new benchmark. This study demonstrates that sequential policies significantly outperform independent learning of factorized policies in CANDID action spaces. In addition, they overcome the scalability limitations associated with learning a single policy across all action dimensions. The code used for our experiments is available under https://github.com/PhilippBordne/candidDAC.

CANDID DAC: Leveraging Coupled Action Dimensions with Importance Differences in DAC

TL;DR

This paper addresses dynamic algorithm configuration (DAC) in high-dimensional action spaces where action dimensions are coupled and differ in importance, a setting it calls CANDID. It introduces a white-box Piecewise Linear benchmark within the DACBench suite to instantiate CANDID properties via a weighted aggregation across dimensions (weights ) and an exponential reward , with targets defined by a piecewise linear function over time steps and dimension combinations. To tackle the resulting coordination challenge, the authors develop sequential policies (SDQN-inspired) that learn a policy per action dimension and condition on previously chosen actions, specifically SAQL and simSDQN, and compare them to a single-agent DDQN baseline and an independent Q-learning baseline. Experiments show that sequential policies achieve superior performance in CANDID settings, scale better with increasing action-space size, and benefit from ordering action selection by importance, suggesting a viable path for coordinating high-dimensional DAC problems in practice. The work provides publicly available code and motivates further integration with state-of-the-art MARL methods and communication strategies to further enhance scalability and coordination.

Abstract

High-dimensional action spaces remain a challenge for dynamic algorithm configuration (DAC). Interdependencies and varying importance between action dimensions are further known key characteristics of DAC problems. We argue that these Coupled Action Dimensions with Importance Differences (CANDID) represent aspects of the DAC problem that are not yet fully explored. To address this gap, we introduce a new white-box benchmark within the DACBench suite that simulates the properties of CANDID. Further, we propose sequential policies as an effective strategy for managing these properties. Such policies factorize the action space and mitigate exponential growth by learning a policy per action dimension. At the same time, these policies accommodate the interdependence of action dimensions by fostering implicit coordination. We show this in an experimental study of value-based policies on our new benchmark. This study demonstrates that sequential policies significantly outperform independent learning of factorized policies in CANDID action spaces. In addition, they overcome the scalability limitations associated with learning a single policy across all action dimensions. The code used for our experiments is available under https://github.com/PhilippBordne/candidDAC.
Paper Structure (11 sections, 2 equations, 3 figures)

This paper contains 11 sections, 2 equations, 3 figures.

Figures (3)

  • Figure 1: (a) Example of a prediction task on a 2D Piecewise Linear instance; (b) Comparison of its reward surface against a 2D Sigmoid instance at 4 different time steps.
  • Figure 2: Average episodic rewards (mean, std from 20 seeds) on the test sets of 5D Sigmoid and Piecewise Linear benchmark ($\lambda = 0.5$ and $\texttt{n\_act} = 3$). Generalization error on 5D Sigmoid results from hyperparameter selection on its training set.
  • Figure 3: Experiments investigating scaling behavior of algorithms. First row keeps number of actions per action dimension fixed at $\texttt{n\_act}=3$ and varies dimensionality $\texttt{dim}$ of action space. Second row keeps $\texttt{dim}$ fixed and varies $\texttt{n\_act}$. Both experiments keep importance decay $\lambda=0.5$ fixed. Rewards from test set (mean, std from 20 seeds).