HODDI: A Dataset of High-Order Drug-Drug Interactions for Computational Pharmacovigilance
Zhaoying Wang, Yingdan Shi, Xiang Liu, Can Chen, Jun Wen, Ren Wang
TL;DR
HODDI addresses the scarcity of higher-order drug–drug interaction data by introducing the FAERS-derived HODDI dataset, containing 109,744 records across 41 quarters, 2,506 drugs, and 4,569 side effects. The authors design a rigorous construction pipeline, including SapBERT-based side-effect labeling, DrugBank mapping, and careful positive/negative sampling, and they evaluate multiple architectures on three evaluation subsets. They find that higher-order information substantially improves prediction, with hypergraph-based models like HGNN-SA achieving the best performance, while simple MLPs can still excel when leveraging higher-order features. This dataset and benchmark framework offer a robust foundation for advancing pharmacovigilance, polypharmacy safety, and personalized medicine through higher-order modeling approaches.
Abstract
Drug-side effect research is vital for understanding adverse reactions arising in complex multi-drug therapies. However, the scarcity of higher-order datasets that capture the combinatorial effects of multiple drugs severely limits progress in this field. Existing resources such as TWOSIDES primarily focus on pairwise interactions. To fill this critical gap, we introduce HODDI, the first Higher-Order Drug-Drug Interaction Dataset, constructed from U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) records spanning the past decade, to advance computational pharmacovigilance. HODDI contains 109,744 records involving 2,506 unique drugs and 4,569 unique side effects, specifically curated to capture multi-drug interactions and their collective impact on adverse effects. Comprehensive statistical analyses demonstrate HODDI's extensive coverage and robust analytical metrics, making it a valuable resource for studying higher-order drug relationships. Evaluating HODDI with multiple models, we found that simple Multi-Layer Perceptron (MLP) can outperform graph models, while hypergraph models demonstrate superior performance in capturing complex multi-drug interactions, further validating HODDI's effectiveness. Our findings highlight the inherent value of higher-order information in drug-side effect prediction and position HODDI as a benchmark dataset for advancing research in pharmacovigilance, drug safety, and personalized medicine. The dataset and codes are available at https://github.com/TIML-Group/HODDI.
