Table of Contents
Fetching ...

Dependency Structure Search Bayesian Optimization for Decision Making Models

Mohit Rajpal, Lac Gia Tran, Yehong Zhang, Bryan Kian Hsiang Low

TL;DR

This work proposes a compact multi-layered architecture modeling the dynamics of agent interactions through the concept of role and introduces Dependency Structure Search Bayesian Optimization to efficiently optimize the multi-layered architecture parameterized by a large number of parameters.

Abstract

Many approaches for optimizing decision making models rely on gradient based methods requiring informative feedback from the environment. However, in the case where such feedback is sparse or uninformative, such approaches may result in poor performance. Derivative-free approaches such as Bayesian Optimization mitigate the dependency on the quality of gradient feedback, but are known to scale poorly in the high-dimension setting of complex decision making models. This problem is exacerbated if the model requires interactions between several agents cooperating to accomplish a shared goal. To address the dimensionality challenge, we propose a compact multi-layered architecture modeling the dynamics of agent interactions through the concept of role. We introduce Dependency Structure Search Bayesian Optimization to efficiently optimize the multi-layered architecture parameterized by a large number of parameters, and show an improved regret bound. Our approach shows strong empirical results under malformed or sparse reward.

Dependency Structure Search Bayesian Optimization for Decision Making Models

TL;DR

This work proposes a compact multi-layered architecture modeling the dynamics of agent interactions through the concept of role and introduces Dependency Structure Search Bayesian Optimization to efficiently optimize the multi-layered architecture parameterized by a large number of parameters.

Abstract

Many approaches for optimizing decision making models rely on gradient based methods requiring informative feedback from the environment. However, in the case where such feedback is sparse or uninformative, such approaches may result in poor performance. Derivative-free approaches such as Bayesian Optimization mitigate the dependency on the quality of gradient feedback, but are known to scale poorly in the high-dimension setting of complex decision making models. This problem is exacerbated if the model requires interactions between several agents cooperating to accomplish a shared goal. To address the dimensionality challenge, we propose a compact multi-layered architecture modeling the dynamics of agent interactions through the concept of role. We introduce Dependency Structure Search Bayesian Optimization to efficiently optimize the multi-layered architecture parameterized by a large number of parameters, and show an improved regret bound. Our approach shows strong empirical results under malformed or sparse reward.
Paper Structure (45 sections, 20 theorems, 61 equations, 14 figures, 8 tables, 4 algorithms)

This paper contains 45 sections, 20 theorems, 61 equations, 14 figures, 8 tables, 4 algorithms.

Key Result

Proposition 1

Let ${\cal{G}}_d = (V_d, E_d)$ represent an additive dependency structure with respect to $v(\theta)$, then the following holds true: $\forall a,b \; \frac{\partial^2 v}{\partial {\theta}^a \partial {\theta}^b} \neq 0 \implies ({{\Theta}}^a, {{\Theta}}^b) \in E_d$ which is a consequence of $v$ forme

Figures (14)

  • Figure 1: Left: hom architecture. gen uses $\theta_r$ and $\theta_g$ during evaluation to yield a model which represents the policy. $\theta_r$ and $\theta_g$ are optimized by bo. Right: Inferring $\mathbf{a}^{\alpha}$ given $\mathbf{s}^{\alpha}$.
  • Figure 2: Left, above, plot of $f(x, y) = {x}^{y}$; below, plot of $f(x,y) = x+y$. The curvature of additively constructed functions is zero; non-zero curvature indicates dependency among input variables. Right, examining the Hessian learns the dependency structure which decomposes complex problems into simpler problems solved by GP-UCB.
  • Figure 3: Ablation study. Training curves of our hom and its ablated variants on different multi-agent environments.
  • Figure 4: Left two plots: Sparse reward drone delivery task. Rightmost: Comparison with hdbo approaches. The left two plots validate the same approaches on different environments.
  • Figure 5: Scaling analysis. Training curves of dss-gp-ucb and competitors with increasing number of agents. The left column shows PredPrey with 6, 9, and 15 agents. The right column shows Het, PredPrey with 6, 9, and 15 agents.
  • ...and 9 more figures

Theorems & Definitions (34)

  • Proposition 1
  • theorem 1
  • theorem 2
  • Lemma 1
  • proof
  • Corollary 1
  • proof
  • Corollary 2
  • proof
  • Lemma 2
  • ...and 24 more