Table of Contents
Fetching ...

HODDI: A Dataset of High-Order Drug-Drug Interactions for Computational Pharmacovigilance

Zhaoying Wang, Yingdan Shi, Xiang Liu, Can Chen, Jun Wen, Ren Wang

TL;DR

HODDI addresses the scarcity of higher-order drug–drug interaction data by introducing the FAERS-derived HODDI dataset, containing 109,744 records across 41 quarters, 2,506 drugs, and 4,569 side effects. The authors design a rigorous construction pipeline, including SapBERT-based side-effect labeling, DrugBank mapping, and careful positive/negative sampling, and they evaluate multiple architectures on three evaluation subsets. They find that higher-order information substantially improves prediction, with hypergraph-based models like HGNN-SA achieving the best performance, while simple MLPs can still excel when leveraging higher-order features. This dataset and benchmark framework offer a robust foundation for advancing pharmacovigilance, polypharmacy safety, and personalized medicine through higher-order modeling approaches.

Abstract

Drug-side effect research is vital for understanding adverse reactions arising in complex multi-drug therapies. However, the scarcity of higher-order datasets that capture the combinatorial effects of multiple drugs severely limits progress in this field. Existing resources such as TWOSIDES primarily focus on pairwise interactions. To fill this critical gap, we introduce HODDI, the first Higher-Order Drug-Drug Interaction Dataset, constructed from U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) records spanning the past decade, to advance computational pharmacovigilance. HODDI contains 109,744 records involving 2,506 unique drugs and 4,569 unique side effects, specifically curated to capture multi-drug interactions and their collective impact on adverse effects. Comprehensive statistical analyses demonstrate HODDI's extensive coverage and robust analytical metrics, making it a valuable resource for studying higher-order drug relationships. Evaluating HODDI with multiple models, we found that simple Multi-Layer Perceptron (MLP) can outperform graph models, while hypergraph models demonstrate superior performance in capturing complex multi-drug interactions, further validating HODDI's effectiveness. Our findings highlight the inherent value of higher-order information in drug-side effect prediction and position HODDI as a benchmark dataset for advancing research in pharmacovigilance, drug safety, and personalized medicine. The dataset and codes are available at https://github.com/TIML-Group/HODDI.

HODDI: A Dataset of High-Order Drug-Drug Interactions for Computational Pharmacovigilance

TL;DR

HODDI addresses the scarcity of higher-order drug–drug interaction data by introducing the FAERS-derived HODDI dataset, containing 109,744 records across 41 quarters, 2,506 drugs, and 4,569 side effects. The authors design a rigorous construction pipeline, including SapBERT-based side-effect labeling, DrugBank mapping, and careful positive/negative sampling, and they evaluate multiple architectures on three evaluation subsets. They find that higher-order information substantially improves prediction, with hypergraph-based models like HGNN-SA achieving the best performance, while simple MLPs can still excel when leveraging higher-order features. This dataset and benchmark framework offer a robust foundation for advancing pharmacovigilance, polypharmacy safety, and personalized medicine through higher-order modeling approaches.

Abstract

Drug-side effect research is vital for understanding adverse reactions arising in complex multi-drug therapies. However, the scarcity of higher-order datasets that capture the combinatorial effects of multiple drugs severely limits progress in this field. Existing resources such as TWOSIDES primarily focus on pairwise interactions. To fill this critical gap, we introduce HODDI, the first Higher-Order Drug-Drug Interaction Dataset, constructed from U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) records spanning the past decade, to advance computational pharmacovigilance. HODDI contains 109,744 records involving 2,506 unique drugs and 4,569 unique side effects, specifically curated to capture multi-drug interactions and their collective impact on adverse effects. Comprehensive statistical analyses demonstrate HODDI's extensive coverage and robust analytical metrics, making it a valuable resource for studying higher-order drug relationships. Evaluating HODDI with multiple models, we found that simple Multi-Layer Perceptron (MLP) can outperform graph models, while hypergraph models demonstrate superior performance in capturing complex multi-drug interactions, further validating HODDI's effectiveness. Our findings highlight the inherent value of higher-order information in drug-side effect prediction and position HODDI as a benchmark dataset for advancing research in pharmacovigilance, drug safety, and personalized medicine. The dataset and codes are available at https://github.com/TIML-Group/HODDI.

Paper Structure

This paper contains 34 sections, 2 figures, 13 tables.

Figures (2)

  • Figure 1: Distribution analysis of medication records and adverse event frequencies in the HODDI dataset (2014Q3-2024Q3). (a) Distribution of drug counts per record (#Drug/Record), showing a concentration between 2-8 drugs per record; (b) Frequency distribution of adverse event occurrences (#Occurrence) in positive samples, with most events occurring 1-50 times; (c) Distribution of adverse event occurrences (#Occurrence) in negative samples, displaying a normal-like distribution centered around 20-30 occurrences. The vertical dashed lines in (a) and (b) mark the intervals of 2-8 #Drug/Record and 5-50 #Occurrence, respectively. These intervals were selected as the filtering criteria for our evaluation set due to their high record counts and the most representative higher-order relationships.
  • Figure 2: Temporal trends of the record number of drug conditions (1-3) and cosine similarity ranges (< 0.8, 0.8-0.9, > 0.9) in the HODDI dataset from 2014Q3 to 2024Q3. The horizontal axis shows the number of records (× 10³), with drug conditions displayed on the left and cosine similarity ranges on the right. The vertical axis represents the quarterly time periods of medication records ranging from 2014Q3 to 2024Q3. The stacked bars demonstrate the distribution of records across different categories over quarterly periods.