Fast Polypharmacy Side Effect Prediction Using Tensor Factorisation
Oliver Lloyd, Yi Liu, Tom R. Gaunt
TL;DR
The paper tackles predicting polypharmacy adverse drug reactions using tensor factorisation on a knowledge-graph representation. It performs exhaustive hyperparameter optimisation and compares two monopharmacy data encodings within LibKGE. The strongest result comes from SimplE with Selfloops, achieving median AUROC 0.978, AUPRC 0.971, and AP@50 1.000 across 963 side effects, and reaching 98.3% of its maximum performance after only two training epochs (~4 minutes). The work demonstrates that tensor-factorisation methods, when carefully tuned, can rival state-of-the-art graph neural networks with far faster training and better reproducibility. Public code and a detailed analysis promote practical adoption and set directions toward hypergraph approaches for multi-drug interactions.
Abstract
Motivation: Adverse reactions from drug combinations are increasingly common, making their accurate prediction a crucial challenge in modern medicine. Laboratory-based identification of these reactions is insufficient due to the combinatorial nature of the problem. While many computational approaches have been proposed, tensor factorisation models have shown mixed results, necessitating a thorough investigation of their capabilities when properly optimized. Results: We demonstrate that tensor factorisation models can achieve state-of-the-art performance on polypharmacy side effect prediction, with our best model (SimplE) achieving median scores of 0.978 AUROC, 0.971 AUPRC, and 1.000 AP@50 across 963 side effects. Notably, this model reaches 98.3\% of its maximum performance after just two epochs of training (approximately 4 minutes), making it substantially faster than existing approaches while maintaining comparable accuracy. We also find that incorporating monopharmacy data as self-looping edges in the graph performs marginally better than using it to initialize embeddings. Availability and Implementation: All code used in the experiments is available in our GitHub repository (https://doi.org/10.5281/zenodo.10684402). The implementation was carried out using Python 3.8.12 with PyTorch 1.7.1, accelerated with CUDA 11.4 on NVIDIA GeForce RTX 2080 Ti GPUs. Contact: oliver.lloyd@bristol.ac.uk Supplementary information: Supplementary data, including precision-recall curves and F1 curves for the best performing model, are available at Bioinformatics online.
