Verifier-Constrained Flow Expansion for Discovery Beyond the Data

Riccardo De Santi; Kimon Protopapas; Ya-Ping Hsieh; Andreas Krause

Verifier-Constrained Flow Expansion for Discovery Beyond the Data

Riccardo De Santi, Kimon Protopapas, Ya-Ping Hsieh, Andreas Krause

TL;DR

Flow Expander is presented, a scalable mirror descent scheme that provably tackles both problems by verifier-constrained entropy maximization over the flow process noised state space and state convergence guarantees under both idealized and general assumptions.

Abstract

Flow and diffusion models are typically pre-trained on limited available data (e.g., molecular samples), covering only a fraction of the valid design space (e.g., the full molecular space). As a consequence, they tend to generate samples from only a narrow portion of the feasible domain. This is a fundamental limitation for scientific discovery applications, where one typically aims to sample valid designs beyond the available data distribution. To this end, we address the challenge of leveraging access to a verifier (e.g., an atomic bonds checker), to adapt a pre-trained flow model so that its induced density expands beyond regions of high data availability, while preserving samples validity. We introduce formal notions of strong and weak verifiers and propose algorithmic frameworks for global and local flow expansion via probability-space optimization. Then, we present Flow Expander (FE), a scalable mirror descent scheme that provably tackles both problems by verifier-constrained entropy maximization over the flow process noised state space. Next, we provide a thorough theoretical analysis of the proposed method, and state convergence guarantees under both idealized and general assumptions. Ultimately, we empirically evaluate our method on both illustrative, yet visually interpretable settings, and on a molecular design task showcasing the ability of FE to expand a pre-trained flow model increasing conformer diversity while preserving validity.

Verifier-Constrained Flow Expansion for Discovery Beyond the Data

TL;DR

Abstract

Paper Structure (48 sections, 9 theorems, 60 equations, 9 figures, 4 tables, 4 algorithms)

This paper contains 48 sections, 9 theorems, 60 equations, 9 figures, 4 tables, 4 algorithms.

Introduction
Our approach
Our contributions
Background and Notation
Problem Statement: Global and Local Flow Expansion
An Idealized Problem: Global Flow Expansion via Strong Verifiers
A Realistic Framework: Local Flow Expansion via Weak Verifiers
Flow-Expander : Scalable Global and Local Expansion via Verifier-Constrained Noised Space Entropy Maximization
Expansion step.
Projection step.
Complete algorithm execution.
Closed-form gradient expressions.
Guarantees for Flow-Expander
Idealized setting.
General setting.
...and 33 more sections

Key Result

lemma 4.0

For objectives defined in the form of Eq. eq:noised_flow_expansion_problem, we have:

Figures (9)

Figure 1: Limited coverage of the valid design space leads to generating sub-optimal samples for downstream optimization tasks.
Figure 2: (\ref{['fig:process_drawing']}) Pre-trained and globally expanded flow model inducing densities $p^{pre}_1$ and optimal density $p_1^*$. (\ref{['fig:verifiers_drawing']}) Valid design space $\Omega$, strong and weak verifiers $\Omega_{v_i}$, $i \in [3]$, and their compositions.
Figure 3: (top) Global FE (G-FE) $$ expands the pre-trained flow model $\pi^{pre}$(\ref{['fig:toy_top_a']}) into $\pi^*$ (violet, \ref{['fig:toy_top_b']}), increasing coverage (i.e., entropy), while preserving validity (i.e., red ellipse interior). Compared with the unconstrained exploration S-MEME method, and constrained generation (CONSTR), Global FE (G-FE) $$ shows best-of-both-worlds behaviour: achieving near-optimal entropy and validity (Fig. \ref{['fig:toy_top_d']}).
Figure 4: Entropy-Validity
Figure 5: (top) L-FE (yellow, \ref{['fig:toy_local_c']}) expands the pre-trained flow model $\pi^{pre}$ (green, \ref{['fig:toy_local_a']}) over promising yet verifier-filtered modes, while FDC (blue, \ref{['fig:toy_local_b']}) expands $\pi^{pre}$ over all plausible modes leading to increased density in invalid regions (left mode in Fig. \ref{['fig:toy_local_b']}). (bottom) FE increases visual (\ref{['fig:qm9_a']}), and quantitative diversity (\ref{['fig:qm9_c']}), while preserving higher validity than FDC (\ref{['fig:qm9_b']}-\ref{['fig:qm9_d']})
...and 4 more figures

Theorems & Definitions (16)

definition 1: Strong Verifier
definition 2: Weak Verifier
lemma 4.0: First Variation of Flow Process Functionals
proposition 1
theorem 5.0: Convergence guarantee in the idealized process-level setting
theorem 5.0: Convergence guarantee in the general process-level setting (informal)
lemma C.1
proof
lemma C.2
proof
...and 6 more

Verifier-Constrained Flow Expansion for Discovery Beyond the Data

TL;DR

Abstract

Verifier-Constrained Flow Expansion for Discovery Beyond the Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (16)