Predictive Feature Caching for Training-free Acceleration of Molecular Geometry Generation

Johanna Sommer; John Rachwan; Nils Fleischmann; Stephan Günnemann; Bertrand Charpentier

Predictive Feature Caching for Training-free Acceleration of Molecular Geometry Generation

Johanna Sommer, John Rachwan, Nils Fleischmann, Stephan Günnemann, Bertrand Charpentier

TL;DR

This work tackles the high inference cost of flow-matching molecular geometry generation by introducing a training-free predictive caching strategy that forecasts intermediate hidden states along solver steps on a $SE(3)$-equivariant backbone. By caching and predicting late-layer features (via TaylorSeer or Adams–Bashforth schemes), the method preserves equivariance while significantly reducing per-step computation, achieving up to $3\times$ speedups at iso-quality and up to $7\times$ when combined with general optimizations. The approach is compatible with pretrained models and complementary to other lossless accelerations, enabling substantial throughput gains without meaningful quality loss on GEOM-Drugs (and QM9). Practically, this enables large-scale sampling (hundreds of thousands to millions of candidates) in a feasible time frame, advancing practical use of geometry-based molecular design workflows.

Abstract

Flow matching models generate high-fidelity molecular geometries but incur significant computational costs during inference, requiring hundreds of network evaluations. This inference overhead becomes the primary bottleneck when such models are employed in practice to sample large numbers of molecular candidates. This work discusses a training-free caching strategy that accelerates molecular geometry generation by predicting intermediate hidden states across solver steps. The proposed method operates directly on the SE(3)-equivariant backbone, is compatible with pretrained models, and is orthogonal to existing training-based accelerations and system-level optimizations. Experiments on the GEOM-Drugs dataset demonstrate that caching achieves a twofold reduction in wall-clock inference time at matched sample quality and a speedup of up to 3x compared to the base model with minimal sample quality degradation. Because these gains compound with other optimizations, applying caching alongside other general, lossless optimizations yield as much as a 7x speedup.

Predictive Feature Caching for Training-free Acceleration of Molecular Geometry Generation

TL;DR

Abstract

Predictive Feature Caching for Training-free Acceleration of Molecular Geometry Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)