Table of Contents
Fetching ...

Functional Graphical Models: Structure Enables Offline Data-Driven Optimization

Jakub Grudzien Kuba, Masatoshi Uehara, Pieter Abbeel, Sergey Levine

TL;DR

This paper introduces functional graphical models (FGMs) and shows theoretically how they can provide for principled data-driven optimization by decomposing the original high-dimensional optimization problem into smaller sub-problems, and implies that DDO with FGMs can achieve nearly optimal designs in situations where naive approaches fail due to insufficient coverage of the offline data.

Abstract

While machine learning models are typically trained to solve prediction problems, we might often want to use them for optimization problems. For example, given a dataset of proteins and their corresponding fluorescence levels, we might want to optimize for a new protein with the highest possible fluorescence. This kind of data-driven optimization (DDO) presents a range of challenges beyond those in standard prediction problems, since we need models that successfully predict the performance of new designs that are better than the best designs seen in the training set. It is not clear theoretically when existing approaches can even perform better than the naive approach that simply selects the best design in the dataset. In this paper, we study how structure can enable sample-efficient data-driven optimization. To formalize the notion of structure, we introduce functional graphical models (FGMs) and show theoretically how they can provide for principled data-driven optimization by decomposing the original high-dimensional optimization problem into smaller sub-problems. This allows us to derive much more practical regret bounds for DDO, and the result implies that DDO with FGMs can achieve nearly optimal designs in situations where naive approaches fail due to insufficient coverage of the offline data. We further present a data-driven optimization algorithm that inferes the FGM structure itself, either over the original input variables or a latent variable representation of the inputs.

Functional Graphical Models: Structure Enables Offline Data-Driven Optimization

TL;DR

This paper introduces functional graphical models (FGMs) and shows theoretically how they can provide for principled data-driven optimization by decomposing the original high-dimensional optimization problem into smaller sub-problems, and implies that DDO with FGMs can achieve nearly optimal designs in situations where naive approaches fail due to insufficient coverage of the offline data.

Abstract

While machine learning models are typically trained to solve prediction problems, we might often want to use them for optimization problems. For example, given a dataset of proteins and their corresponding fluorescence levels, we might want to optimize for a new protein with the highest possible fluorescence. This kind of data-driven optimization (DDO) presents a range of challenges beyond those in standard prediction problems, since we need models that successfully predict the performance of new designs that are better than the best designs seen in the training set. It is not clear theoretically when existing approaches can even perform better than the naive approach that simply selects the best design in the dataset. In this paper, we study how structure can enable sample-efficient data-driven optimization. To formalize the notion of structure, we introduce functional graphical models (FGMs) and show theoretically how they can provide for principled data-driven optimization by decomposing the original high-dimensional optimization problem into smaller sub-problems. This allows us to derive much more practical regret bounds for DDO, and the result implies that DDO with FGMs can achieve nearly optimal designs in situations where naive approaches fail due to insufficient coverage of the offline data. We further present a data-driven optimization algorithm that inferes the FGM structure itself, either over the original input variables or a latent variable representation of the inputs.
Paper Structure (32 sections, 11 theorems, 46 equations, 5 figures, 1 algorithm)

This paper contains 32 sections, 11 theorems, 46 equations, 5 figures, 1 algorithm.

Key Result

lemma 1

Suppose $f({\mathbf{x}})$ is twice-continuously differentiable w.r.t. ${\mathbf{x}}$. Let $A, B, S\subseteq \mathcal{V}$. Then, the following statements are equivalent.

Figures (5)

  • Figure 1: The clique set of this graph is given by $\{ \{0, 1, 2\}, \{2, 3, 4\}, \{4, 5, 6\} \}$. Hence, if it is a FGM of $f({\mathbf{x}})$, then $f({\mathbf{x}})=f_{0,1,2}({\mathbf{x}}_{0,1,2}) + f_{2,3,4}({\mathbf{x}}_{2,3,4}) + f_{4,5,6}({\mathbf{x}}_{4,5,6})$.
  • Figure 2: Consider a function $f({\textnormal{x}}_1, {\textnormal{x}}_2)=-({\textnormal{x}}_1-1)^2 -({\textnormal{x}}_2-2)^2$. Clearly, the singleton cliques $\{{\textnormal{x}}_1\}$ and $\{{\textnormal{x}}_2\}$ are functionally independent, while the data coming from a correlated normal distribution are not statistically independent. While the dataset does not jointly cover the optimal solution ${\mathbf{x}}^{\star}=(1,2)$, it does cover individual components ${\textnormal{x}}^{\star}_{1}=1, {\textnormal{x}}^{\star}_{2}=2$, and thus a method that can learn the component functions can compose them into ${\mathbf{x}}^{\star}$.
  • Figure 3: Figure (\ref{['fig:experiment']}): regret (lower-better) of DDO with (blue) and without (orange) FGM, against dimension in the toy quadratic problem. Averaged over 50 runs, showing 95$\%$-confidence intervals. Figure (\ref{['fig:experiment2']}): value (higher-better) of DDO with MLP neural networks, for Gaussian data, with (blue) and without (orange) FGM, against the iteration of gradient ascent. Averaged over 128 designs, showing one-tenth of standard deviation.
  • Figure 4: Values of $f(\hat{{\mathbf{x}}})$, where $\hat{{\mathbf{x}}}\sim \pi$, along the course of gradient ascent on $\pi$, for 11-, 21-, 31-, and 41-dimensional problems ($x$-axis: iterations, $y$-axis: design values). The evaluation is over the top-128 designs from a sample of 1024. Lack of curve indicates generation of invalid designs. As the dimensionality grows, the FGM decomposition becomes more ciritical.
  • Figure 5: Values of $f(\hat{{\mathbf{x}}})$, where $\hat{{\mathbf{x}}}\sim \pi$, along the course of gradient ascent on $\pi$, for 41-, and 61-dimensional problems ($x$-axis: iterations, $y$-axis: design values). The evaluation is over the top-128 designs from a sample of 1024. Lack of curve indicates generation of invalid designs. The unobserved base distribution generating the data was an even mixture of two Gaussian distributions $\mathcal{N}(-1, I)$ and $\mathcal{N}(1, I)$.

Theorems & Definitions (18)

  • definition 1: Functional Independence
  • lemma 1
  • definition 2: Functional Graphical Model (FGM)
  • theorem 1: Function Decomposition
  • theorem 2: Regret of DDO with FGMs
  • lemma 2: Improvement of Coverage terms
  • lemma 3
  • theorem 3: Hammersley-Clifford clifford1971markov
  • lemma 3
  • proof
  • ...and 8 more