Table of Contents
Fetching ...

Thin Bridges for Drug Text Alignment: Lightweight Contrastive Learning for Target Specific Drug Retrieval

Mallikarjuna Tupakula

TL;DR

This work tackles cross-modal drug-text alignment by using thin contrastive bridges that connect frozen unimodal encoders for chemical fingerprints and biomedical text. The authors train dual linear projections with a symmetric InfoNCE objective in a shared $d=256$ space, employing a temperature of $T=0.07$ and a margin-based hard-negative strategy to disambiguate drugs acting on the same target. On ChEMBL with scaffold-based splits, the ECFP4-based bridge combined with enriched text achieves strong cross-modal retrieval, including high within-target discrimination (e.g., Recall@1 up to ~0.76 and MRR ~0.86), and demonstrates generalization beyond seen scaffolds. The results argue that compute-efficient thin bridges can provide scalable, scaffold-aware retrieval foundations for downstream generative drug design pipelines, enabling rapid, target-specific drug text alignment with practical implications for precision medicine.

Abstract

Multimodal foundation models hold promise for drug discovery and biomedical applications, but most existing approaches rely on heavy pretraining or large scale multimodal corpora. We investigate whether thin contrastive bridges, lightweight projection heads over frozen unimodal encoders can align chemical and textual representations without training a full multimodal model. Using paired mechanisms from ChEMBL, we align ECFP4 molecular fingerprints with biomedical sentence embeddings through dual linear projections trained with a contrastive objective. To better handle drugs sharing the same therapeutic target, we incorporate hard negative weighting and a margin loss. Evaluation under scaffold based splits, which require generalization across disjoint chemical cores, demonstrates that our approach achieves non-trivial cross modal alignment and substantially improves within target discrimination compared to frozen baselines. These results suggest that thin bridges offer a compute efficient alternative to large scale multimodal pretraining, enabling scaffold aware drug text alignment and target specific retrieval in precision medicine.

Thin Bridges for Drug Text Alignment: Lightweight Contrastive Learning for Target Specific Drug Retrieval

TL;DR

This work tackles cross-modal drug-text alignment by using thin contrastive bridges that connect frozen unimodal encoders for chemical fingerprints and biomedical text. The authors train dual linear projections with a symmetric InfoNCE objective in a shared space, employing a temperature of and a margin-based hard-negative strategy to disambiguate drugs acting on the same target. On ChEMBL with scaffold-based splits, the ECFP4-based bridge combined with enriched text achieves strong cross-modal retrieval, including high within-target discrimination (e.g., Recall@1 up to ~0.76 and MRR ~0.86), and demonstrates generalization beyond seen scaffolds. The results argue that compute-efficient thin bridges can provide scalable, scaffold-aware retrieval foundations for downstream generative drug design pipelines, enabling rapid, target-specific drug text alignment with practical implications for precision medicine.

Abstract

Multimodal foundation models hold promise for drug discovery and biomedical applications, but most existing approaches rely on heavy pretraining or large scale multimodal corpora. We investigate whether thin contrastive bridges, lightweight projection heads over frozen unimodal encoders can align chemical and textual representations without training a full multimodal model. Using paired mechanisms from ChEMBL, we align ECFP4 molecular fingerprints with biomedical sentence embeddings through dual linear projections trained with a contrastive objective. To better handle drugs sharing the same therapeutic target, we incorporate hard negative weighting and a margin loss. Evaluation under scaffold based splits, which require generalization across disjoint chemical cores, demonstrates that our approach achieves non-trivial cross modal alignment and substantially improves within target discrimination compared to frozen baselines. These results suggest that thin bridges offer a compute efficient alternative to large scale multimodal pretraining, enabling scaffold aware drug text alignment and target specific retrieval in precision medicine.

Paper Structure

This paper contains 17 sections, 1 equation, 6 figures, 1 table.

Figures (6)

  • Figure 1: ECFP4 bridge with enriched text (text_rich). Left: before training. Right: after training, showing clear diagonal alignment.
  • Figure 2: Cosine similarity matrices on the scaffold split test set (first $K{=}40$ pairs). Left: random heads. Right: trained bridge with strong diagonal alignment.
  • Figure 3: Cumulative Match (Recall@$k$) on the scaffold split test set. Global retrieval (blue) improves steadily with $k$, while within-target retrieval (orange) climbs steeply, showing most correct matches appear in small sets.
  • Figure 4: Ablation on grouped Recall@1 across temperature $T$, margin $m$, and drug name inclusion in text_rich. Best: WithDrug, $T{=}0.05$, $m{=}0.15$.
  • Figure 5: Cosine similarity heatmap between the first $K=40$ drug--text pairs after enrichment. The uniform pattern indicates that frozen encoders alone do not yield meaningful alignment.
  • ...and 1 more figures