Robust Inference-Time Steering of Protein Diffusion Models via Embedding Optimization

Minhuan Li; Jiequn Han; Pilar Cossio; Luhuan Wu

Robust Inference-Time Steering of Protein Diffusion Models via Embedding Optimization

Minhuan Li, Jiequn Han, Pilar Cossio, Luhuan Wu

TL;DR

This work tackles the challenge of inferring biomolecular conformations that honor both physical realism and experimental data by leveraging diffusion-model priors. It introduces EmbedOpt, an inference-time approach that optimizes the conditional embedding $c$ to maximize an experimental likelihood reward, effectively reshaping the prior to align with measurements without modifying model parameters. Across cryo-EM map fitting and distance-constraint benchmarks, EmbedOpt demonstrates higher robustness and similar or improved accuracy relative to posterior sampling baselines, while requiring fewer diffusion steps for efficient inference. The method offers practical impact by enabling more reliable and scalable structure determination under prior–likelihood mismatch, and it points to future directions in embedding-space regularization and ensemble inference.

Abstract

In many biophysical inverse problems, the goal is to generate biomolecular conformations that are both physically plausible and consistent with experimental measurements. As recent sequence-to-structure diffusion models provide powerful data-driven priors, posterior sampling has emerged as a popular framework by guiding atomic coordinates to target conformations using experimental likelihoods. However, when the target lies in a low-density region of the prior, posterior sampling requires aggressive and brittle weighting of the likelihood guidance. Motivated by this limitation, we propose EmbedOpt, an alternative inference-time approach for steering diffusion models to optimize experimental likelihoods in the conditional embedding space. As this space encodes rich sequence and coevolutionary signals, optimizing over it effectively shifts the diffusion prior to align with experimental constraints. We validate EmbedOpt on two benchmarks simulating cryo-electron microscopy map fitting and experimental distance constraints. We show that EmbedOpt outperforms the coordinate-based posterior sampling method in map fitting tasks, matches performance on distance constraint tasks, and exhibits superior engineering robustness across hyperparameters spanning two orders of magnitude. Moreover, its smooth optimization behavior enables a significant reduction in the number of diffusion steps required for inference, leading to better efficiency.

Robust Inference-Time Steering of Protein Diffusion Models via Embedding Optimization

TL;DR

to maximize an experimental likelihood reward, effectively reshaping the prior to align with measurements without modifying model parameters. Across cryo-EM map fitting and distance-constraint benchmarks, EmbedOpt demonstrates higher robustness and similar or improved accuracy relative to posterior sampling baselines, while requiring fewer diffusion steps for efficient inference. The method offers practical impact by enabling more reliable and scalable structure determination under prior–likelihood mismatch, and it points to future directions in embedding-space regularization and ensemble inference.

Abstract

Paper Structure (42 sections, 2 theorems, 41 equations, 8 figures, 4 algorithms)

This paper contains 42 sections, 2 theorems, 41 equations, 8 figures, 4 algorithms.

Introduction
Background
Diffusion Models.
Protein Sequence-to-Structure Diffusion Models.
Method
Setup.
EmbedOpt: Objective and Main Algorithm
Gradient Normalization.
Theoretical Analysis
Comparison to DPS
Related Works
Machine-Learning Methods for Solving Structure-Determination Inverse Problems.
Exploration of Alternative Conformations.
General Diffusion Inference-Time Steering.
Experiments
...and 27 more sections

Key Result

Proposition 1

For a fixed $(x_t, c_t, \sigma_t)$, consider the transition to $(x_{t-1}, c_{t-1}, \sigma_{t-1})$ under the EmbedOpt update (alg:embed-opt, Line 6-8). Assume $F$ is locally smooth and $\hat{x}_\theta$ is bounded in a neighborhood of $(x_t, c_t, \sigma_t)$. The following bound holds where $g_{c_t}=\nabla_{c_t} F(x_t,c_t, \sigma_t)$ and $G_x$ and $G_\sigma$ are Lipschitz constants depending on $(x_

Figures (8)

Figure 1: Synthetic illustration comparing DPS and EmbedOpt under prior-likelihood mismatch. The diffusion prior has limited overlap with the measurement likelihood (top). With a likelihood weight $=1$ (controlled by $\alpha_t$ in \ref{['eq:dps-update-main-text2']}), the posterior is distant from the measurement (second). Upweigting the likelihood (third) moves the posterior toward the measurement but leads to an ill-conditioned sampling landscape. In contrast, EmbedOpt (bottom) iteratively updates the prior during sampling, progressively correcting the initial prior misspecification and enabling samples to concentrate near the measurement. All curves are shown as smoothed histograms. See \ref{['app:sec:synthetic']} for more details.
Figure 2: Schematic of Inference-Time Embedding Optimization (EmbedOpt). Experimental measurements like Cryo-EM maps provide rich conformational information that may not correspond to dominant modes of pretrained prediction models. EmbedOpt adapts a pretrained diffusion model at inference time by maximizing an experimental likelihood reward $R(\cdot)$. (Top) In a single forward pass, the conditional embedding is iteratively updated ($\left.c_t \rightarrow c_{t-1}\right)$ using gradients from the denoised structure $\hat{x}_\theta$, greedily increasing the reward along the denoise trajectory. (Bottom) While applicable to general conditional frameworks, we use an AlphaFold 3-style backbone where the Conditioning Module serves as an embedder of coevolutionary information, processing sequence and MSA inputs to generate the initial embedding $c_T$.
Figure 3: Cryo-EM Map Fitting Benchmark. (a) Visualization of a challenging target: 8H1I requires significant inter-domain rearrangement of the prior structure ($cc = 0.42$) to fit the target density map (gray volume), DPS remains trapped in a local optimum ($cc=0.58$), and EmbedOpt successfully reorients the domains ($c c=0.93$). $cc$ is tha map correlation coefficient. (b) Best-achieved Performance vs. Task Difficulty: (left) Best-sampled structures (from $7$ learning rates $\times 3$ random seeds) across 77 systems (dots), binned by task difficulty ( 1 - map correlation of the prior model). Mean and standard deviation are calculated in each bin. EmbedOpt maintains an advantage especially on harder targets ($>0.4$). (right) The difference plot shows EmbedOpt outperforming DPS across the majority of systems. (c) Hyperparameter Sensitivity: Distribution of map correlations across all systems for varying learning rates. Box plots span the interquartile range (25th-75th percentile) with the median highlighted. Whiskers extend to 1.5× IQR. Individual data are shown as semi-transparent dots. DPS exhibits a brittle "sweet spot" due to the tug-of-war between guidance strength and structural validity controlled by learning rate. EmbedOpt maintains high performance across learning rates spanning two orders of magnitude.
Figure 4: Distance Constraint Benchmark. (a) Optimization Trajectory: Representative surrogate reward traces for system 6V7W show EmbedOpt increases surrogate reward smoothly and monotonically to the optimum, while DPS has high-frequency volatility. (b) Step-Efficiency Scaling: We substantially reduce # of diffusion steps from 200 down to 20 while keeping the $\text{base learning rate } \alpha \times \text{\# of steps} =$ const., where const. is picked with $\alpha=0.1$ at 200 steps, an optimal choice for both methods based on Figure \ref{['app:fig:af_distance_results']}b. Left: With sufficient diffusion steps (200), both methods satisfy nearly all constraints; however, DPS performance deteriorates sharply below 100 steps, whereas EmbedOpt maintains a median constraint satisfaction rate above 75%. Right: EmbedOpt preserves good stereochemical quality (MolProbity score $<2$) as diffusion steps are reduced fourfold ($200 \to 50$), while DPS degrades substantially below 100 steps.
Figure 5: Cryo-EM Map Fitting Benchmark: Sample Gallery of Representative Results across Test Systems (Hyperparameter-Tuned). Structures display the best samples from each method following hyperparameter sweeping. Both methods perform robustly on targets where the unguided prior is already well-aligned with the target map (e.g., 8CAW). However, for targets requiring significant global conformational rearrangement (e.g., 8W2Q), EmbedOpt consistently achieves higher map correlation. We note 8F2R as the unique outlier where DPS significantly outperforms EmbedOpt, and 8K23 as a failure case where neither method successfully recovers the target structure.
...and 3 more figures

Theorems & Definitions (2)

Proposition 1: Informal
Proposition 2: EmbedOpt one-step surrogate reward improvement

Robust Inference-Time Steering of Protein Diffusion Models via Embedding Optimization

TL;DR

Abstract

Robust Inference-Time Steering of Protein Diffusion Models via Embedding Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (2)