Table of Contents
Fetching ...

Inductive-Associative Meta-learning Pipeline with Human Cognitive Patterns for Unseen Drug-Target Interaction Prediction

Xiaoqing Lian, Jie Zhu, Tianxu Lv, Shiyun Nie, Hang Fan, Guosheng Wu, Yunjun Ge, Lihua Li, Xiangxiang Zeng, Xiang Pan

TL;DR

BioBridge introduces an inductive-associative meta-learning pipeline for unseen drug–target interaction prediction that relies on limited sequence data. It combines a multi-level encoder with adversarial training to extract transferable binding principles and a dynamic prototype meta-learning framework to reason over weakly related annotations. Across cold-pair, cross-domain, zero-shot, and few-shot splits, BioBridge achieves state-of-the-art or competitive results, with notable improvements in unseen-protein scenarios and practical virtual screening demonstrations. The approach offers interpretable interaction fingerprints and scalable deployment with sequence data alone, highlighting its potential to accelerate early-stage drug discovery, while acknowledging current limitations in 3D structural integration and encoder speed.

Abstract

Significant differences in protein structures hinder the generalization of existing drug-target interaction (DTI) models, which often rely heavily on pre-learned binding principles or detailed annotations. In contrast, BioBridge designs an Inductive-Associative pipeline inspired by the workflow of scientists who base their accumulated expertise on drawing insights into novel drug-target pairs from weakly related references. BioBridge predicts novel drug-target interactions using limited sequence data, incorporating multi-level encoders with adversarial training to accumulate transferable binding principles. On these principles basis, BioBridge employs a dynamic prototype meta-learning framework to associate insights from weakly related annotations, enabling robust predictions for previously unseen drug-target pairs. Extensive experiments demonstrate that BioBridge surpasses existing models, especially for unseen proteins. Notably, when only homologous protein binding data is available, BioBridge proves effective for virtual screening of the epidermal growth factor receptor and adenosine receptor, underscoring its potential in drug discovery.

Inductive-Associative Meta-learning Pipeline with Human Cognitive Patterns for Unseen Drug-Target Interaction Prediction

TL;DR

BioBridge introduces an inductive-associative meta-learning pipeline for unseen drug–target interaction prediction that relies on limited sequence data. It combines a multi-level encoder with adversarial training to extract transferable binding principles and a dynamic prototype meta-learning framework to reason over weakly related annotations. Across cold-pair, cross-domain, zero-shot, and few-shot splits, BioBridge achieves state-of-the-art or competitive results, with notable improvements in unseen-protein scenarios and practical virtual screening demonstrations. The approach offers interpretable interaction fingerprints and scalable deployment with sequence data alone, highlighting its potential to accelerate early-stage drug discovery, while acknowledging current limitations in 3D structural integration and encoder speed.

Abstract

Significant differences in protein structures hinder the generalization of existing drug-target interaction (DTI) models, which often rely heavily on pre-learned binding principles or detailed annotations. In contrast, BioBridge designs an Inductive-Associative pipeline inspired by the workflow of scientists who base their accumulated expertise on drawing insights into novel drug-target pairs from weakly related references. BioBridge predicts novel drug-target interactions using limited sequence data, incorporating multi-level encoders with adversarial training to accumulate transferable binding principles. On these principles basis, BioBridge employs a dynamic prototype meta-learning framework to associate insights from weakly related annotations, enabling robust predictions for previously unseen drug-target pairs. Extensive experiments demonstrate that BioBridge surpasses existing models, especially for unseen proteins. Notably, when only homologous protein binding data is available, BioBridge proves effective for virtual screening of the epidermal growth factor receptor and adenosine receptor, underscoring its potential in drug discovery.

Paper Structure

This paper contains 28 sections, 14 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: (a) Dataset Preparation: Protein sequences are clustered into $n$ classes. Drug molecular sequences are represented as molecular graphs via the RdKit landrum2013rdkit. Based on protein clusters, data is split into source and target domains, then divided into meta-tasks. (b) BioBridge Encoder: BioBridge inputs protein sequences and molecular graphs using CNN and GCN to model internal forces. Bilinear attention captures interactions at multiple levels, while gated attention aggregates interpretable interaction fingerprints. (c) Stage One: BioBridge pre-trains with labelled source data and unlabeled target data. The loss function $\mathcal{L}_s$ learns binding information, while $\mathcal{L}_d$ handles cross-domain adversarial learning. This binding knowledge is transferred to the next stage. (d) Stage Two: Tasks from the target domain, form support and query sets. The concatenated interactions are treated as Q and K, and support interactions as V. A dynamic prototype learning module defines unique class prototypes. Cosine similarity determines binding status, with an adaptive loss function $\mathcal{L}_f$ facilitating learning. (e) BioBridge generalizes well across tasks, providing interpretable interaction fingerprints for biological insights.
  • Figure 2: (a) Performance comparison on the BindingDB, BioSNAP, and Human datasets using random splitting. The bubble size corresponds to the value of the metric. (b) Comparison of cross-domain performance on the BindingDB and BioSNAP datasets using cluster-based pair splitting. (c) Zero-shot comparison of meta cross-domain splitting on BindingDB and BioSNAP datasets, indicating that protein differences are important factors limiting drug target prediction.
  • Figure 3: (a) The t-SNE visualization presents drug-target pairs across various tasks, with lower Davies-Bouldin (DB) indices indicating superior performance. (b) The visualization of drug ligand and target binding pocket attention is generated using rdkit landrum2013rdkit for drug molecule mapping and PyMOL delano2002pymol for target plotting. PLIP adasme2021plip is utilized to plot the forces between drug targets. The molecular diagram highlights the top 20$\%$ of model-concerned positions in red, while the target docking diagram depicts the model's focus in yellow, aligning with actual interacting residues. (c) Performance on the PDBBind v2020 datasets is compared, with $\uparrow$ indicating higher scores are favorable and $\downarrow$ signifying the opposite. (d) Virtual screening examples from tyrosinase and adenosine receptor families feature top 10$\%$ scoring compounds with a limited query protein in the support set. (e) In these virtual screenings, color represents the query protein's frequency in the support set, and bubble size corresponds to score magnitude. Top 10$\%$ meta-tasks for adenosine receptors even achieved 100$\%$ accuracy. BioBridge surpassed DrugBAN in traditional classification tasks, demonstrating superior adaptability to novel drug-target pairs.
  • Figure A1: The training losses of the different variants of BioBridge vary with epoch.
  • Figure A2: Ablation experiments for CADA modules in cross-domain settings.
  • ...and 1 more figures