Table of Contents
Fetching ...

Verifier-Constrained Flow Expansion for Discovery Beyond the Data

Riccardo De Santi, Kimon Protopapas, Ya-Ping Hsieh, Andreas Krause

TL;DR

Flow Expander is presented, a scalable mirror descent scheme that provably tackles both problems by verifier-constrained entropy maximization over the flow process noised state space and state convergence guarantees under both idealized and general assumptions.

Abstract

Flow and diffusion models are typically pre-trained on limited available data (e.g., molecular samples), covering only a fraction of the valid design space (e.g., the full molecular space). As a consequence, they tend to generate samples from only a narrow portion of the feasible domain. This is a fundamental limitation for scientific discovery applications, where one typically aims to sample valid designs beyond the available data distribution. To this end, we address the challenge of leveraging access to a verifier (e.g., an atomic bonds checker), to adapt a pre-trained flow model so that its induced density expands beyond regions of high data availability, while preserving samples validity. We introduce formal notions of strong and weak verifiers and propose algorithmic frameworks for global and local flow expansion via probability-space optimization. Then, we present Flow Expander (FE), a scalable mirror descent scheme that provably tackles both problems by verifier-constrained entropy maximization over the flow process noised state space. Next, we provide a thorough theoretical analysis of the proposed method, and state convergence guarantees under both idealized and general assumptions. Ultimately, we empirically evaluate our method on both illustrative, yet visually interpretable settings, and on a molecular design task showcasing the ability of FE to expand a pre-trained flow model increasing conformer diversity while preserving validity.

Verifier-Constrained Flow Expansion for Discovery Beyond the Data

TL;DR

Flow Expander is presented, a scalable mirror descent scheme that provably tackles both problems by verifier-constrained entropy maximization over the flow process noised state space and state convergence guarantees under both idealized and general assumptions.

Abstract

Flow and diffusion models are typically pre-trained on limited available data (e.g., molecular samples), covering only a fraction of the valid design space (e.g., the full molecular space). As a consequence, they tend to generate samples from only a narrow portion of the feasible domain. This is a fundamental limitation for scientific discovery applications, where one typically aims to sample valid designs beyond the available data distribution. To this end, we address the challenge of leveraging access to a verifier (e.g., an atomic bonds checker), to adapt a pre-trained flow model so that its induced density expands beyond regions of high data availability, while preserving samples validity. We introduce formal notions of strong and weak verifiers and propose algorithmic frameworks for global and local flow expansion via probability-space optimization. Then, we present Flow Expander (FE), a scalable mirror descent scheme that provably tackles both problems by verifier-constrained entropy maximization over the flow process noised state space. Next, we provide a thorough theoretical analysis of the proposed method, and state convergence guarantees under both idealized and general assumptions. Ultimately, we empirically evaluate our method on both illustrative, yet visually interpretable settings, and on a molecular design task showcasing the ability of FE to expand a pre-trained flow model increasing conformer diversity while preserving validity.
Paper Structure (48 sections, 9 theorems, 60 equations, 9 figures, 4 tables, 4 algorithms)

This paper contains 48 sections, 9 theorems, 60 equations, 9 figures, 4 tables, 4 algorithms.

Key Result

lemma 4.0

For objectives defined in the form of Eq. eq:noised_flow_expansion_problem, we have:

Figures (9)

  • Figure 1: Limited coverage of the valid design space leads to generating sub-optimal samples for downstream optimization tasks.
  • Figure 2: (\ref{['fig:process_drawing']}) Pre-trained and globally expanded flow model inducing densities $p^{pre}_1$ and optimal density $p_1^*$. (\ref{['fig:verifiers_drawing']}) Valid design space $\Omega$, strong and weak verifiers $\Omega_{v_i}$, $i \in [3]$, and their compositions.
  • Figure 3: (top) Global FE (G-FE) $$ expands the pre-trained flow model $\pi^{pre}$(\ref{['fig:toy_top_a']}) into $\pi^*$ (violet, \ref{['fig:toy_top_b']}), increasing coverage (i.e., entropy), while preserving validity (i.e., red ellipse interior). Compared with the unconstrained exploration S-MEME method, and constrained generation (CONSTR), Global FE (G-FE) $$ shows best-of-both-worlds behaviour: achieving near-optimal entropy and validity (Fig. \ref{['fig:toy_top_d']}).
  • Figure 4: Entropy-Validity
  • Figure 5: (top) L-FE (yellow, \ref{['fig:toy_local_c']}) expands the pre-trained flow model $\pi^{pre}$ (green, \ref{['fig:toy_local_a']}) over promising yet verifier-filtered modes, while FDC (blue, \ref{['fig:toy_local_b']}) expands $\pi^{pre}$ over all plausible modes leading to increased density in invalid regions (left mode in Fig. \ref{['fig:toy_local_b']}). (bottom) FE increases visual (\ref{['fig:qm9_a']}), and quantitative diversity (\ref{['fig:qm9_c']}), while preserving higher validity than FDC (\ref{['fig:qm9_b']}-\ref{['fig:qm9_d']})
  • ...and 4 more figures

Theorems & Definitions (16)

  • definition 1: Strong Verifier
  • definition 2: Weak Verifier
  • lemma 4.0: First Variation of Flow Process Functionals
  • proposition 1
  • theorem 5.0: Convergence guarantee in the idealized process-level setting
  • theorem 5.0: Convergence guarantee in the general process-level setting (informal)
  • lemma C.1
  • proof
  • lemma C.2
  • proof
  • ...and 6 more