Table of Contents
Fetching ...

Inference-time optimization for experiment-grounded protein ensemble generation

Advaith Maddipatla, Anar Rzayev, Marco Pegoraro, Martin Pacesa, Paul Schanda, Ailie Marx, Sanketh Vedula, Alex M. Bronstein

TL;DR

This work introduces a general inference-time optimization framework that consistently outperforms state-of-the-art guidance, improving diversity, physical energy, and agreement with data in X-ray crystallography and NMR, often fitting the experimental data better than deposited PDB structures.

Abstract

Protein function relies on dynamic conformational ensembles, yet current generative models like AlphaFold3 often fail to produce ensembles that match experimental data. Recent experiment-guided generators attempt to address this by steering the reverse diffusion process. However, these methods are limited by fixed sampling horizons and sensitivity to initialization, often yielding thermodynamically implausible results. We introduce a general inference-time optimization framework to solve these challenges. First, we optimize over latent representations to maximize ensemble log-likelihood, rather than perturbing structures post hoc. This approach eliminates dependence on diffusion length, removes initialization bias, and easily incorporates external constraints. Second, we present novel sampling schemes for drawing Boltzmann-weighted ensembles. By combining structural priors from AlphaFold3 with force-field-based priors, we sample from their product distribution while balancing experimental likelihoods. Our results show that this framework consistently outperforms state-of-the-art guidance, improving diversity, physical energy, and agreement with data in X-ray crystallography and NMR, often fitting the experimental data better than deposited PDB structures. Finally, inference-time optimization experiments maximizing ipTM scores reveal that perturbing AlphaFold3 embeddings can artificially inflate model confidence. This exposes a vulnerability in current design metrics, whose mitigation could offer a pathway to reduce false discovery rates in binder engineering.

Inference-time optimization for experiment-grounded protein ensemble generation

TL;DR

This work introduces a general inference-time optimization framework that consistently outperforms state-of-the-art guidance, improving diversity, physical energy, and agreement with data in X-ray crystallography and NMR, often fitting the experimental data better than deposited PDB structures.

Abstract

Protein function relies on dynamic conformational ensembles, yet current generative models like AlphaFold3 often fail to produce ensembles that match experimental data. Recent experiment-guided generators attempt to address this by steering the reverse diffusion process. However, these methods are limited by fixed sampling horizons and sensitivity to initialization, often yielding thermodynamically implausible results. We introduce a general inference-time optimization framework to solve these challenges. First, we optimize over latent representations to maximize ensemble log-likelihood, rather than perturbing structures post hoc. This approach eliminates dependence on diffusion length, removes initialization bias, and easily incorporates external constraints. Second, we present novel sampling schemes for drawing Boltzmann-weighted ensembles. By combining structural priors from AlphaFold3 with force-field-based priors, we sample from their product distribution while balancing experimental likelihoods. Our results show that this framework consistently outperforms state-of-the-art guidance, improving diversity, physical energy, and agreement with data in X-ray crystallography and NMR, often fitting the experimental data better than deposited PDB structures. Finally, inference-time optimization experiments maximizing ipTM scores reveal that perturbing AlphaFold3 embeddings can artificially inflate model confidence. This exposes a vulnerability in current design metrics, whose mitigation could offer a pathway to reduce false discovery rates in binder engineering.
Paper Structure (57 sections, 37 equations, 11 figures, 10 tables, 3 algorithms)

This paper contains 57 sections, 37 equations, 11 figures, 10 tables, 3 algorithms.

Figures (11)

  • Figure 1: Gradient guidance versus inference-time optimization in experiment-guided AlphaFold3. (Left) In gradient guidance, the Pairformer conditioning variable $\mathbf{Z}$ is fixed, and experimental gradients $\nabla_{\bm{\mathcal{X}}} \log p(\mathbf{y}|\bm{\mathcal{X}})$ are applied directly to the coordinates $\bm{\mathcal{X}}$ during reverse diffusion. (Right) In inference-time optimization (MSA Guidance), conditioning embeddings $\bm{\mathcal{Z}}$ are updated using the experimental likelihood $\log p(\mathbf{y}|\bm{\mathcal{X}})$, while structures are denoised via reverse diffusion conditioned on the optimized embeddings. The dotted line shows the gradient flow.
  • Figure 2: Nested inference-time optimization of AlphaFold3 conditioning embeddings. (Left) The outer loop runs $K$ diffusion processes, each initializing a new reverse diffusion trajectory from noise while carrying forward the optimized conditioning embeddings $\bm{\mathcal{Z}}$ from the previous diffusion process. Within each diffusion process, experimental likelihood gradients update the embeddings (inner loop), which then condition subsequent denoising steps; optional force-field-based Boltzmann weights $\mathbf{w}$ bias ensemble statistics toward thermodynamically plausible conformations. (Right) Conceptually, successive diffusion trajectories explore the MSA embedding space (left panel), with embeddings refined across outer iterations. The resulting optimized embeddings induce ensembles in conformation space (right panel) that concentrate on regions consistent with experimental observations while preserving structural diversity.
  • Figure 3: Inference-time (IT) optimization improves structural accuracy over guided and unguided baselines. (Top) 3AZY:A ($1.45\mathrm{\AA}$) exhibits bimodal distribution at residues 157-163. Unguided AF3 predicts a single mode, while guidance produces a bimodal ensemble with a poorly fit backbone at one of the modes. IT-optimization recovers both modes with accurate density fit, matching the PDB. (Bottom) For 6I42:B ($1.38\mathrm{\AA}$), AF3 mispredicts the bound 13-residue peptide. Guidance improves backbone placement but poorly predicts side-chains, whereas IT-optimization yields accurate backbone and side-chain agreement. Numbers beneath each ensemble indicate cosine similarity to $F_{\mathrm{o}}$.
  • Figure 4: Energy-weighted inference improves thermodynamic stability of the ensemble. Energy changes relative to a uniformly weighted baseline (dashed line) for energy-weighted sampling and energy-weighted IT-Opt. Left: ProteinEBM. Right: AMBER99. Negative change means more stable structures.
  • Figure 5: Inference-time optimization, with and without energy weighting, reduces NOE restraint violations. NOE constraint violations in 2K0M and ensembles from PDB, Guidance (Uniform), Inference time Optimization (Uniform + IT-opt), Energy-weighted Guidance, and Energy-weighted IT-Opt. Violated constraints are shown as lines and colored by violation magnitude. The percentage of violated constraints and their median violation distance are reported below each structure. Bottom left: cumulative distribution of violation magnitudes across all five generated ensembles.
  • ...and 6 more figures