Table of Contents
Fetching ...

Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu

TL;DR

This work considers finding a latent space that serves as a compressed yet accurate representation of the design-value joint space, enabling effective latent exploration of high-value input design modes, and proposes Noise-intensified Telescoping density-Ratio Estimation (NTRE) scheme for variational learning of an accurate latent space model without costly Markov Chain Monte Carlo.

Abstract

Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues include but are not limited to high sample complexity, which relates to inaccurate approximation of black-box function; and insufficient coverage and exploration of input design modes, which leads to suboptimal proposal of new input designs. In this work, we consider finding a latent space that serves as a compressed yet accurate representation of the design-value joint space, enabling effective latent exploration of high-value input design modes. To this end, we formulate an learnable energy-based latent space, and propose Noise-intensified Telescoping density-Ratio Estimation (NTRE) scheme for variational learning of an accurate latent space model without costly Markov Chain Monte Carlo. The optimization process is then exploration of high-value designs guided by the learned energy-based model in the latent space, formulated as gradient-based sampling from a latent-variable-parameterized inverse model. We show that our particular parameterization encourages expanded exploration around high-value design modes, motivated by inversion thinking of a fundamental result of conditional covariance matrix typically used for variance reduction. We observe that our method, backed by an accurately learned informative latent space and an expanding-exploration model design, yields significant improvements over strong previous methods on both synthetic and real world datasets such as the design-bench suite.

Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

TL;DR

This work considers finding a latent space that serves as a compressed yet accurate representation of the design-value joint space, enabling effective latent exploration of high-value input design modes, and proposes Noise-intensified Telescoping density-Ratio Estimation (NTRE) scheme for variational learning of an accurate latent space model without costly Markov Chain Monte Carlo.

Abstract

Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues include but are not limited to high sample complexity, which relates to inaccurate approximation of black-box function; and insufficient coverage and exploration of input design modes, which leads to suboptimal proposal of new input designs. In this work, we consider finding a latent space that serves as a compressed yet accurate representation of the design-value joint space, enabling effective latent exploration of high-value input design modes. To this end, we formulate an learnable energy-based latent space, and propose Noise-intensified Telescoping density-Ratio Estimation (NTRE) scheme for variational learning of an accurate latent space model without costly Markov Chain Monte Carlo. The optimization process is then exploration of high-value designs guided by the learned energy-based model in the latent space, formulated as gradient-based sampling from a latent-variable-parameterized inverse model. We show that our particular parameterization encourages expanded exploration around high-value design modes, motivated by inversion thinking of a fundamental result of conditional covariance matrix typically used for variance reduction. We observe that our method, backed by an accurately learned informative latent space and an expanding-exploration model design, yields significant improvements over strong previous methods on both synthetic and real world datasets such as the design-bench suite.
Paper Structure (56 sections, 8 theorems, 45 equations, 6 figures, 14 tables, 1 algorithm)

This paper contains 56 sections, 8 theorems, 45 equations, 6 figures, 14 tables, 1 algorithm.

Key Result

Proposition 3.1

(Approx. grad.) As $M \to \infty$, $\nabla_{{\boldsymbol{\alpha}}_k}\mathop{\mathrm{\mathcal{L}}}\nolimits_M({\boldsymbol{\alpha}}_k) \to \nabla_{{\boldsymbol{\alpha}}_k} \mathop{\mathrm{\mathrm{ELBO}}}\nolimits_{\rm re}$.

Figures (6)

  • Figure 1: Graphical illustration of our framework leo. We illustrate the learning scheme in the left panel. We construct an energy-based latent space model $p_{\boldsymbol{\alpha}}$ for offline bbo via learning a series of ratio estimators $\{r_{{\boldsymbol{\alpha}}_k}\}_{k=0}^K$ with the ntre objective $\mathop{\mathrm{\mathcal{L}}}\nolimits_{M\to\infty}$ to optimize the ELBO withoutmcmc. We illustrate the optimization scheme in the right panel. After training, we solve offline bbo by sampling from the $z$-parameterized inverse model $p_{\boldsymbol{\theta}}(\mathbf{x} | y) \propto \mathbb{E}_{\color{red} p_{\boldsymbol{\gamma}}(\mathbf{z} | y)} [{\color{red}p_{{\boldsymbol{\beta}}, \mathbf{x}}(\mathbf{x} | \mathbf{z})}]$, where $p_{\boldsymbol{\gamma}}(\mathbf{z} | y) \propto p_{{\boldsymbol{\beta}}, y}(y | \mathbf{z}) p_{\boldsymbol{\alpha}}(\mathbf{z})$ given $y$ as the offline dataset maximum. Specifically, we first sample $\mathbf{z} \sim {\color{red} p_{\boldsymbol{\gamma}}(\mathbf{z} | y)}$, and then generate $\mathbf{x} \sim {\color{red} p_{\boldsymbol{\beta}, \mathbf{x}}(\mathbf{x} | \mathbf{z})}$. We provide previous parameterizations for the inverse model in the right panel for reference. Best viewed in color.
  • Figure 2: Results of our method on uniformly sampled branin dataset w/ and w/o top-10% points. Zoom-in for more details.
  • Figure 3: Branin function level sets.
  • Figure 4: Viz. of Branin samples. (b-d) are results of our method. G-SV denotes the Gaussian prior model sampled with svgd. MLE-LD and MLE-SV denote the lebm trained by mle sampled with ld and svgd, respectively.
  • Figure A1: Histogram of normalized function values in the Hopper dataset. The distribution is highly skewed towards low function values (plot from mashkaria2023generative).
  • ...and 1 more figures

Theorems & Definitions (21)

  • Proposition 3.1
  • Remark 3.2
  • Definition 3.3
  • Theorem 3.4
  • Remark 3.5
  • Proposition 3.6
  • Remark 3.7
  • Theorem 3.8
  • Remark 3.9
  • proof
  • ...and 11 more