Table of Contents
Fetching ...

Freeze, Diffuse, Decode: Geometry-Aware Adaptation of Pretrained Transformer Embeddings for Antimicrobial Peptide Design

Pankhil Gawade, Adam Izdebski, Myriam Lizotte, Kevin R. Moon, Jake S. Rhodes, Guy Wolf, Ewa Szczurek

TL;DR

The paper addresses the challenge of transferring pretrained transformer embeddings to molecular design tasks without disturbing their geometric structure. It introduces Freeze, Diffuse, Decode (FDD), a diffusion-based framework that adapts frozen embeddings along their intrinsic manifold to align with task signals, preserving pretrained geometry while enhancing expressivity. Empirical results in antimicrobial peptide design show that FDD improves predictive performance, enables retrieval of novel active peptides, and supports smooth latent-space interpolation, yielding interpretable, low-dimensional representations. The work demonstrates that geometry-aware diffusion can serve as a robust alternative to fine-tuning for transferring large pretrained representations to data-scarce molecular tasks.

Abstract

Pretrained transformers provide rich, general-purpose embeddings, which are transferred to downstream tasks. However, current transfer strategies: fine-tuning and probing, either distort the pretrained geometric structure of the embeddings or lack sufficient expressivity to capture task-relevant signals. These issues become even more pronounced when supervised data are scarce. Here, we introduce Freeze, Diffuse, Decode (FDD), a novel diffusion-based framework that adapts pre-trained embeddings to downstream tasks while preserving their underlying geometric structure. FDD propagates supervised signal along the intrinsic manifold of frozen embeddings, enabling a geometry-aware adaptation of the embedding space. Applied to antimicrobial peptide design, FDD yields low-dimensional, predictive, and interpretable representations that support property prediction, retrieval, and latent-space interpolation.

Freeze, Diffuse, Decode: Geometry-Aware Adaptation of Pretrained Transformer Embeddings for Antimicrobial Peptide Design

TL;DR

The paper addresses the challenge of transferring pretrained transformer embeddings to molecular design tasks without disturbing their geometric structure. It introduces Freeze, Diffuse, Decode (FDD), a diffusion-based framework that adapts frozen embeddings along their intrinsic manifold to align with task signals, preserving pretrained geometry while enhancing expressivity. Empirical results in antimicrobial peptide design show that FDD improves predictive performance, enables retrieval of novel active peptides, and supports smooth latent-space interpolation, yielding interpretable, low-dimensional representations. The work demonstrates that geometry-aware diffusion can serve as a robust alternative to fine-tuning for transferring large pretrained representations to data-scarce molecular tasks.

Abstract

Pretrained transformers provide rich, general-purpose embeddings, which are transferred to downstream tasks. However, current transfer strategies: fine-tuning and probing, either distort the pretrained geometric structure of the embeddings or lack sufficient expressivity to capture task-relevant signals. These issues become even more pronounced when supervised data are scarce. Here, we introduce Freeze, Diffuse, Decode (FDD), a novel diffusion-based framework that adapts pre-trained embeddings to downstream tasks while preserving their underlying geometric structure. FDD propagates supervised signal along the intrinsic manifold of frozen embeddings, enabling a geometry-aware adaptation of the embedding space. Applied to antimicrobial peptide design, FDD yields low-dimensional, predictive, and interpretable representations that support property prediction, retrieval, and latent-space interpolation.

Paper Structure

This paper contains 27 sections, 9 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of the FDD framework.
  • Figure 2: Embedding space for AMP/NonAMP retrieval using FDD. Red points indicate active peptide embeddings from the held-out dataset projected into FDD latent space.
  • Figure 3: Interpolated trajectory in the FDD latent space between non-AMP and AMP regions.
  • Figure 4: Evolution of physicochemical descriptors along the latent interpolation path: (a) net charge, (b) isoelectric point, and (c) hydrophobic amino-acid ratio. All descriptors steadily increase across the interpolation path, indicating a smooth transition from non-AMP to AMP-like regions in latent space.