MOTIVE: A Drug-Target Interaction Graph For Inductive Link Prediction

John Arevalo; Ellen Su; Anne E Carpenter; Shantanu Singh

MOTIVE: A Drug-Target Interaction Graph For Inductive Link Prediction

John Arevalo, Ellen Su, Anne E Carpenter, Shantanu Singh

Abstract

Drug-target interaction (DTI) prediction is crucial for identifying new therapeutics and detecting mechanisms of action. While structure-based methods accurately model physical interactions between a drug and its protein target, cell-based assays such as Cell Painting can better capture complex DTI interactions. This paper introduces MOTIVE, a Morphological cOmpound Target Interaction Graph dataset comprising Cell Painting features for 11,000 genes and 3,600 compounds, along with their relationships extracted from seven publicly available databases. We provide random, cold-source (new drugs), and cold-target (new genes) data splits to enable rigorous evaluation under realistic use cases. Our benchmark results show that graph neural networks that use Cell Painting features consistently outperform those that learn from graph structure alone, feature-based models, and topological heuristics. MOTIVE accelerates both graph ML research and drug discovery by promoting the development of more reliable DTI prediction models. MOTIVE resources are available at https://github.com/carpenter-singh-lab/motive.

MOTIVE: A Drug-Target Interaction Graph For Inductive Link Prediction

Abstract

Paper Structure (17 sections, 1 equation, 3 figures, 4 tables, 2 algorithms)

This paper contains 17 sections, 1 equation, 3 figures, 4 tables, 2 algorithms.

Introduction
Related work
The MOTIVE dataset
Morphological profiles extraction
Annotation collection
Graph construction
Data splitting
Negative sampling algorithm
Models
Results
DTI prediction improves with CP features
Inductive link prediction benefits from graph structure and CP features
Ablation studies with graph structure
Zero-shot prediction potential
Discussion
...and 2 more sections

Figures (3)

Figure 1: Schematic of the random split (top row) and cold-source split (bottom row). The left-most graph illustrates the actual partitioning of the edges, and the three graphs to the right show which nodes and edges are visible to the GNN models during training, validation, and testing. The number of edges in each partition is not representative of our true 70/10/20 ratios. Cold-target split is symmetrical to cold-source split. The model aggregates neighbor features via the message-passing edges (solid lines) and makes predictions on the supervision edges (dotted lines).
Figure 2: Test metrics for GraphSAGE$_{CP}$ and GraphSAGE$_{embs}$ for all four graph structures, with random data splits, and averaged over 5 runs. The colored stratifications of each bar show the decreasing performances of the models as edge types are removed from the graph. Note that the bars of each color are overlaid onto each other in the order specified in the legend, such that a structure color will only appear if it obtained worse performance than the previous structure.
Figure 3: Average precision distributions for test set sources and targets. The GIN$_{CP}$ model was trained on cold-source split data. Each point in the histogram is a source or target node, and the color indicates whether that node had been seen during model training.

MOTIVE: A Drug-Target Interaction Graph For Inductive Link Prediction

Abstract

MOTIVE: A Drug-Target Interaction Graph For Inductive Link Prediction

Authors

Abstract

Table of Contents

Figures (3)