Thin Bridges for Drug Text Alignment: Lightweight Contrastive Learning for Target Specific Drug Retrieval
Mallikarjuna Tupakula
TL;DR
This work tackles cross-modal drug-text alignment by using thin contrastive bridges that connect frozen unimodal encoders for chemical fingerprints and biomedical text. The authors train dual linear projections with a symmetric InfoNCE objective in a shared $d=256$ space, employing a temperature of $T=0.07$ and a margin-based hard-negative strategy to disambiguate drugs acting on the same target. On ChEMBL with scaffold-based splits, the ECFP4-based bridge combined with enriched text achieves strong cross-modal retrieval, including high within-target discrimination (e.g., Recall@1 up to ~0.76 and MRR ~0.86), and demonstrates generalization beyond seen scaffolds. The results argue that compute-efficient thin bridges can provide scalable, scaffold-aware retrieval foundations for downstream generative drug design pipelines, enabling rapid, target-specific drug text alignment with practical implications for precision medicine.
Abstract
Multimodal foundation models hold promise for drug discovery and biomedical applications, but most existing approaches rely on heavy pretraining or large scale multimodal corpora. We investigate whether thin contrastive bridges, lightweight projection heads over frozen unimodal encoders can align chemical and textual representations without training a full multimodal model. Using paired mechanisms from ChEMBL, we align ECFP4 molecular fingerprints with biomedical sentence embeddings through dual linear projections trained with a contrastive objective. To better handle drugs sharing the same therapeutic target, we incorporate hard negative weighting and a margin loss. Evaluation under scaffold based splits, which require generalization across disjoint chemical cores, demonstrates that our approach achieves non-trivial cross modal alignment and substantially improves within target discrimination compared to frozen baselines. These results suggest that thin bridges offer a compute efficient alternative to large scale multimodal pretraining, enabling scaffold aware drug text alignment and target specific retrieval in precision medicine.
