Blend the Separated: Mixture of Synergistic Experts for Data-Scarcity Drug-Target Interaction Prediction
Xinlong Zhai, Chunchen Wang, Ruijia Wang, Jiazheng Kang, Shujie Li, Boyu Chen, Tengfei Ma, Zikai Zhou, Cheng Yang, Chuan Shi
TL;DR
This paper tackles drug-target interaction prediction under data scarcity by introducing MoseDTI, a mixture of synergistic experts with an adaptive gating mechanism. The extrinsic expert leverages knowledge graph embeddings from unlabeled data, while the intrinsic expert encodes drugs and targets from structural and sequence information, and a gating model intelligently fuses their predictions. Crucially, the two experts mutually supervise each other through pseudo-labeling to exploit abundant unlabeled data, enabling robust performance even when one data perspective is missing or labels are scarce. Empirical results across multiple real-world datasets demonstrate significant improvements over state-of-the-art, with up to 53.53% gains in few-shot settings and strong generalization when data is abundant.
Abstract
Drug-target interaction prediction (DTI) is essential in various applications including drug discovery and clinical application. There are two perspectives of input data widely used in DTI prediction: Intrinsic data represents how drugs or targets are constructed, and extrinsic data represents how drugs or targets are related to other biological entities. However, any of the two perspectives of input data can be scarce for some drugs or targets, especially for those unpopular or newly discovered. Furthermore, ground-truth labels for specific interaction types can also be scarce. Therefore, we propose the first method to tackle DTI prediction under input data and/or label scarcity. To make our model functional when only one perspective of input data is available, we design two separate experts to process intrinsic and extrinsic data respectively and fuse them adaptively according to different samples. Furthermore, to make the two perspectives complement each other and remedy label scarcity, two experts synergize with each other in a mutually supervised way to exploit the enormous unlabeled data. Extensive experiments on 3 real-world datasets under different extents of input data scarcity and/or label scarcity demonstrate our model outperforms states of the art significantly and steadily, with a maximum improvement of 53.53%. We also test our model without any data scarcity and it still outperforms current methods.
