Table of Contents
Fetching ...

Cliqueformer: Model-Based Optimization with Structured Transformers

Jakub Grudzien Kuba, Pieter Abbeel, Sergey Levine

TL;DR

This work addresses offline model-based optimization by exploiting the target function's structure through functional graphical models (FGMs). It introduces Cliqueformer, a transformer-based architecture that enforces a predefined FGM clique decomposition in a learned latent space and regularizes clique marginals with a variational information bottleneck, enabling robust design proposals. Across Latent Radial-Basis Functions, superconductors, TFBind-8, and DNA Enhancers, Cliqueformer achieves state-of-the-art performance without the need for explicit conservative penalties, demonstrating strong generalization to high-dimensional design tasks. By combining end-to-end FGM structure learning with scalable transformer components, the approach offers a practical pathway to applying deep models to complex design problems in chemistry and biology while mitigating distribution shift and enabling efficient optimization.

Abstract

Large neural networks excel at prediction tasks, but their application to design problems, such as protein engineering or materials discovery, requires solving offline model-based optimization (MBO) problems. While predictive models may not directly translate to effective design, recent MBO algorithms incorporate reinforcement learning and generative modeling approaches. Meanwhile, theoretical work suggests that exploiting the target function's structure can enhance MBO performance. We present Cliqueformer, a transformer-based architecture that learns the black-box function's structure through functional graphical models (FGM), addressing distribution shift without relying on explicit conservative approaches. Across various domains, including chemical and genetic design tasks, Cliqueformer demonstrates superior performance compared to existing methods.

Cliqueformer: Model-Based Optimization with Structured Transformers

TL;DR

This work addresses offline model-based optimization by exploiting the target function's structure through functional graphical models (FGMs). It introduces Cliqueformer, a transformer-based architecture that enforces a predefined FGM clique decomposition in a learned latent space and regularizes clique marginals with a variational information bottleneck, enabling robust design proposals. Across Latent Radial-Basis Functions, superconductors, TFBind-8, and DNA Enhancers, Cliqueformer achieves state-of-the-art performance without the need for explicit conservative penalties, demonstrating strong generalization to high-dimensional design tasks. By combining end-to-end FGM structure learning with scalable transformer components, the approach offers a practical pathway to applying deep models to complex design problems in chemistry and biology while mitigating distribution shift and enabling efficient optimization.

Abstract

Large neural networks excel at prediction tasks, but their application to design problems, such as protein engineering or materials discovery, requires solving offline model-based optimization (MBO) problems. While predictive models may not directly translate to effective design, recent MBO algorithms incorporate reinforcement learning and generative modeling approaches. Meanwhile, theoretical work suggests that exploiting the target function's structure can enhance MBO performance. We present Cliqueformer, a transformer-based architecture that learns the black-box function's structure through functional graphical models (FGM), addressing distribution shift without relying on explicit conservative approaches. Across various domains, including chemical and genetic design tasks, Cliqueformer demonstrates superior performance compared to existing methods.

Paper Structure

This paper contains 25 sections, 4 theorems, 8 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

theorem 1

For a real-valued function $f({\mathbf{x}})$ with FGM with maximal cliques $\mathcal{C}$, and policy class $\Pi$, the regret of MBO with FGM information satisfies where $\text{C}_{\text{stat}}$, $\text{C}_{\text{cpx}}$ are distribution and complexity constants defined in Appendix ap:the.

Figures (6)

  • Figure 1: The first building block of the Lat. RBF tasks are 3D radial-basis functions (left). These functions are applied to triplets arranged in a chain of triangles FGM (center) and linearly mixed. Then, observable designs are produced with non-linear transformations of the chain and, together with their values, form a dataset. We show the score (right) of our structure-learning Cliqueformer and structure-oblivious COMstrabucco2021conservative, against the dimension of Lat. RBF functions, modulated only by varying the number of triangles. Cliqueformer, unlike COMs, sustains strong performance across all dimensions. More results in Section \ref{['sec:exp']}.
  • Figure 2: An FGM of a 5D function which decomposes as $f({\mathbf{x}}) = f_{-5}({\mathbf{x}}_{-5}) + f_{-1}({\mathbf{x}}_{-1})$. By Definition \ref{['def:fgm']}, nodes ${\textnormal{x}}_1$ and ${\textnormal{x}}_5$ are not linked.
  • Figure 3: Left column. FGM known. The model without the FGM information (red) fits the data poorly (Fig. (\ref{['fig:lil-exp-loss']})) and leads to poor designs (Fig. (\ref{['fig:lil-exp-value']})). The model with FGM information (blue) achieves a good fit and leads to designs largely superior to the data. Righ column. FGM unknown. The model with an FGM decomposition (blue) achieves a slightly better fit than an oblivious model (red) but leads to significantly better designs.
  • Figure 4: Illustration of construction in the proof of Theorem \ref{['lemma:rotation']} in 2D. Red axes (${\mathbf{z}}$) show contour lines of $f$ depending on both coordinates ${\textnormal{z}}_1$ and ${\textnormal{z}}_2$. Blue axes (${\mathbf{v}}$) show the same contours depending only on ${\textnormal{v}}_1$ after rotation. The Gaussian distribution (green circles) maintains circular shape in both coordinate systems, demonstrating invariance under rotation.
  • Figure 5: Illustration of information flow in Cliqueformer's training. Data are shown in navy, learnable variables in blue, neural modules in red, and loss functions in pink. The input ${\mathbf{x}}$ is passed to a transformer encoder to compute representation ${\mathbf{z}}$ which is decomposed into cliques with small overlapping knots (highlighted in colors on the figure). The representation goes to the parallel MLPs whose outputs, added together, predict target ${\textnormal{y}}$. The representation ${\mathbf{z}}$ is also fed to a transformer decoder that tries to recover the original input ${\mathbf{x}}$. Additionally, the representation goes through an information bottleneck during training.
  • ...and 1 more figures

Theorems & Definitions (6)

  • definition 1
  • theorem 1: grudzien2024functional
  • theorem 2
  • theorem 2: grudzien2024functional
  • theorem 2
  • proof