Table of Contents
Fetching ...

Sequential Multi-Agent Dynamic Algorithm Configuration

Chen Lu, Ke Xue, Lei Yuan, Yao Wang, Yaoyuan Wang, Sheng Fu, Chao Qian

TL;DR

The paper tackles dynamic algorithm configuration for complex algorithms with inter-dependent hyperparameters by modeling the task as a contextual sequential multi-agent MMDP. It introduces SADN, a Sequential Advantage Decomposition Network, which decomposes the global advantage into sequential, per-agent advantages, while satisfying the Individual Global Max principle to enable efficient decentralized execution. Empirical results on synthetic DAC benchmarks and the MOEA/D problem show that SADN outperforms state-of-the-art MARL baselines and generalizes across problem classes, validating the dependency-aware sequential approach. The work provides a new paradigm for automated algorithm configuration that explicitly accounts for parameter inter-dependencies and action ordering, with open-source code for reproducibility and broader applicability in complex optimization tasks.

Abstract

Dynamic algorithm configuration (DAC) is a recent trend in automated machine learning, which can dynamically adjust the algorithm's configuration during the execution process and relieve users from tedious trial-and-error tuning tasks. Recently, multi-agent reinforcement learning (MARL) approaches have improved the configuration of multiple heterogeneous hyperparameters, making various parameter configurations for complex algorithms possible. However, many complex algorithms have inherent inter-dependencies among multiple parameters (e.g., determining the operator type first and then the operator's parameter), which are, however, not considered in previous approaches, thus leading to sub-optimal results. In this paper, we propose the sequential multi-agent DAC (Seq-MADAC) framework to address this issue by considering the inherent inter-dependencies of multiple parameters. Specifically, we propose a sequential advantage decomposition network, which can leverage action-order information through sequential advantage decomposition. Experiments from synthetic functions to the configuration of multi-objective optimization algorithms demonstrate Seq-MADAC's superior performance over state-of-the-art MARL methods and show strong generalization across problem classes. Seq-MADAC establishes a new paradigm for the widespread dependency-aware automated algorithm configuration. Our code is available at https://github.com/lamda-bbo/seq-madac.

Sequential Multi-Agent Dynamic Algorithm Configuration

TL;DR

The paper tackles dynamic algorithm configuration for complex algorithms with inter-dependent hyperparameters by modeling the task as a contextual sequential multi-agent MMDP. It introduces SADN, a Sequential Advantage Decomposition Network, which decomposes the global advantage into sequential, per-agent advantages, while satisfying the Individual Global Max principle to enable efficient decentralized execution. Empirical results on synthetic DAC benchmarks and the MOEA/D problem show that SADN outperforms state-of-the-art MARL baselines and generalizes across problem classes, validating the dependency-aware sequential approach. The work provides a new paradigm for automated algorithm configuration that explicitly accounts for parameter inter-dependencies and action ordering, with open-source code for reproducibility and broader applicability in complex optimization tasks.

Abstract

Dynamic algorithm configuration (DAC) is a recent trend in automated machine learning, which can dynamically adjust the algorithm's configuration during the execution process and relieve users from tedious trial-and-error tuning tasks. Recently, multi-agent reinforcement learning (MARL) approaches have improved the configuration of multiple heterogeneous hyperparameters, making various parameter configurations for complex algorithms possible. However, many complex algorithms have inherent inter-dependencies among multiple parameters (e.g., determining the operator type first and then the operator's parameter), which are, however, not considered in previous approaches, thus leading to sub-optimal results. In this paper, we propose the sequential multi-agent DAC (Seq-MADAC) framework to address this issue by considering the inherent inter-dependencies of multiple parameters. Specifically, we propose a sequential advantage decomposition network, which can leverage action-order information through sequential advantage decomposition. Experiments from synthetic functions to the configuration of multi-objective optimization algorithms demonstrate Seq-MADAC's superior performance over state-of-the-art MARL methods and show strong generalization across problem classes. Seq-MADAC establishes a new paradigm for the widespread dependency-aware automated algorithm configuration. Our code is available at https://github.com/lamda-bbo/seq-madac.

Paper Structure

This paper contains 43 sections, 2 theorems, 24 equations, 5 figures, 6 tables, 5 algorithms.

Key Result

Lemma 1

In any cooperative Markov game, given a joint policy $\boldsymbol{\pi}$, the global advantage function $A^{\boldsymbol{\pi}} (s,\boldsymbol{a})$, and $n$ agents in total, for any state $s$, the following equations hold:

Figures (5)

  • Figure 1: Workflow of the proposed sequential advantage decomposition network (SADN).
  • Figure 2: Training curves of return value obtained by the compared methods on four Seq-Sigmoid variant tasks, where the results are averaged over 6 runs.
  • Figure 3: Training curves of return value obtained by the sequential methods on four Seq-Sigmoid-Robust tasks, where the results are averaged over 6 runs.
  • Figure 4: Training curves of the return value obtained by the compared methods on the original Sigmoid benchmark.
  • Figure 5: Training curves of return value obtained by correct order and reverse order on four Seq-Sigmoid variant tasks, where the results are averaged over 3 runs.

Theorems & Definitions (7)

  • Definition 1: Individual Global Max son2019qtran
  • Definition 2: happo
  • Lemma 1: Multi-Agent Advantage Decomposition happo
  • Theorem 1
  • proof
  • proof
  • Definition 3