Table of Contents
Fetching ...

A Nonparametric Bayesian Local-Global Model for Enhanced Adverse Event Signal Detection in Spontaneous Reporting System Data

Xin-Wei Huang, Saptarshi Chakraborty

Abstract

Spontaneous reporting system databases are key resources for post-marketing surveillance, providing real-world evidence (RWE) on the adverse events (AEs) of regulated drugs or other medical products. Various statistical methods have been proposed for AE signal detection in these databases, flagging drug-specific AEs with disproportionately high observed counts compared to expected counts under independence. However, signal detection remains challenging for rare AEs or newer drugs, which receive small observed and expected counts and thus suffer from reduced statistical power. Principled information sharing on signal strengths across drugs/AEs is crucial in such cases to enhance signal detection. However, existing methods typically ignore complex between-drug associations on AE signal strengths, limiting their ability to detect signals. We propose novel local-global mixture Dirichlet process (DP) prior-based nonparametric Bayesian models to capture these associations, enabling principled information sharing between drugs while balancing flexibility and shrinkage for each drug, thereby enhancing statistical power. We develop efficient Markov chain Monte Carlo algorithms for implementation and employ a false discovery rate (FDR)-controlled, false negative rate (FNR)-optimized hypothesis testing framework for AE signal detection. Extensive simulations demonstrate our methods' superior sensitivity -- often surpassing existing approaches by a twofold or greater margin -- while strictly controlling the FDR. An application to FDA FAERS data on statin drugs further highlights our methods' effectiveness in real-world AE signal detection. Software implementing our methods is provided as supplementary material.

A Nonparametric Bayesian Local-Global Model for Enhanced Adverse Event Signal Detection in Spontaneous Reporting System Data

Abstract

Spontaneous reporting system databases are key resources for post-marketing surveillance, providing real-world evidence (RWE) on the adverse events (AEs) of regulated drugs or other medical products. Various statistical methods have been proposed for AE signal detection in these databases, flagging drug-specific AEs with disproportionately high observed counts compared to expected counts under independence. However, signal detection remains challenging for rare AEs or newer drugs, which receive small observed and expected counts and thus suffer from reduced statistical power. Principled information sharing on signal strengths across drugs/AEs is crucial in such cases to enhance signal detection. However, existing methods typically ignore complex between-drug associations on AE signal strengths, limiting their ability to detect signals. We propose novel local-global mixture Dirichlet process (DP) prior-based nonparametric Bayesian models to capture these associations, enabling principled information sharing between drugs while balancing flexibility and shrinkage for each drug, thereby enhancing statistical power. We develop efficient Markov chain Monte Carlo algorithms for implementation and employ a false discovery rate (FDR)-controlled, false negative rate (FNR)-optimized hypothesis testing framework for AE signal detection. Extensive simulations demonstrate our methods' superior sensitivity -- often surpassing existing approaches by a twofold or greater margin -- while strictly controlling the FDR. An application to FDA FAERS data on statin drugs further highlights our methods' effectiveness in real-world AE signal detection. Software implementing our methods is provided as supplementary material.

Paper Structure

This paper contains 30 sections, 10 equations, 7 figures, 9 tables, 3 algorithms.

Figures (7)

  • Figure 1: Kendall's $\tau$ correlation matrix for occurrence counts of 1491 commonly occurring AEs (PTs) among six statin drugs in the FDA FAERS data (2014 Q1-2020 Q4).
  • Figure 2: Schematic diagrams visualizing the mechanisms of a "local only" DP prior with one drug (Panel A) and local-global mixture DP prior with $J = 3$ drugs (Panel B). Throughout, darker shades represent larger values. Panel A: The observed relative reporting rates $n_{ij}/E_{ij}$ for $I=6$ AEs for the drug $j$ (Panel A.1) show high variability across $i$ (different shades of green) due to randomness in observed data. The atoms$\theta_{hj}$ (Panel A.2) of the DP prior enable clustering of $I = 6$ AE signal strengths into $K_{+} = 3$ non-empty DP clusters. This ensures information sharing: the AEs $(i = 1, i = 3, i = 5)$ in cluster 1 and separately the AEs $(i = 2, i = 6)$ in cluster 2 (cluster memberships are visualized via arrows between Panels A.1 and A.2) share their observed $(n_{ij}, E_{ij})$ information to inform their cluster-specific DP atoms (single atom per cluster). These DP atoms, together with the cluster memberships, produce the final $I=6$ AE signal strengths $\lambda_{ij}$ (Panel A.3). Panel B: The noisy observed data $(n_{ij}, E_{ij})$ leading to noisy observed relative reporting rates $n_{ij}/E_{ij}$ (Panel B.1; different shades of green) are used to produce "local" signal strengths $\lambda_{ij}^l$ for $I = 6$ AEs and $J=3$ drugs using a separate, independent local DP prior for each drug (Panel B.2.1), and global AE signal strengths $\lambda_i^g$ for shared by all $J = 3$ drugs based on a common global DP prior (Column B.2.2). Probabilistic two-component mixtures of $\lambda_{ij}^l$ and $\lambda_{i}$ (Panel B.3) based on the local distribution indicators $z_{ij}$ produce the final realized signal strengths $\lambda_{ij}$ (Panel B.4).
  • Figure 3: Simulation results: the horizontal axis shows the true signal strength $\lambda_0$ for signal cells, the vertical axis shows the value of evaluation metrics. Each column presents one metric and each row presents one simulation scenario in Table \ref{['tb:simulation_scenarios']}.
  • Figure 4: Heatmaps of 15 selected PTs. Cells not detected as signals by our method are colored gray. White-to-blue shades represent the median of the posterior of signal strength parameter $\lambda_{ij}$, with darker (lighter) blue indicating a stronger (weaker) signal strength. Panel A contains 5 common PTs among all statin drugs. Panel B contains 5 semi-rare PTs with new discovered signals by our proposed method. Panel C contains 5 rare PTs with new discovered signals by our proposed method. Count $n$, expected count $E$, and the 90% credible interval of the posterior distribution of $\lambda_{ij}$ are shown in each cell. The letters "H", "B", "G", and "L" indicate signal detection by the DP Hu et al., BCPNN, GPS, and likelihood ratio test, respectively.
  • Figure :
  • ...and 2 more figures

Theorems & Definitions (4)

  • Remark
  • Remark
  • Remark
  • Remark