Table of Contents
Fetching ...

Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies

Wei Jiang, Weichuan Yu

TL;DR

It is proved that the Jlfdr‐based method achieves higher power than commonly used meta‐analysis methods when analyzing heterogeneous datasets from multiple GWASs and discovers more associations than meta‐ analysis methods from empirical datasets of four phenotypes.

Abstract

In genome-wide association studies (GWASs) of common diseases/traits, we often analyze multiple GWASs with the same phenotype together to discover associated genetic variants with higher power. Since it is difficult to access data with detailed individual measurements, summary-statistics-based meta-analysis methods have become popular to jointly analyze data sets from multiple GWASs. In this paper, we propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the false discovery rate at a certain level. In particular, the Jlfdr-based method achieves higher power than commonly used meta-analysis methods when analyzing heterogeneous data sets from multiple GWASs. Simulation experiments demonstrate the superior power of our method over meta-analysis methods. Also, our method discovers more associations than meta-analysis methods from empirical data sets of four phenotypes. The R-package is available at: http://bioinformatics.ust.hk/Jlfdr.html.

Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies

TL;DR

It is proved that the Jlfdr‐based method achieves higher power than commonly used meta‐analysis methods when analyzing heterogeneous datasets from multiple GWASs and discovers more associations than meta‐ analysis methods from empirical datasets of four phenotypes.

Abstract

In genome-wide association studies (GWASs) of common diseases/traits, we often analyze multiple GWASs with the same phenotype together to discover associated genetic variants with higher power. Since it is difficult to access data with detailed individual measurements, summary-statistics-based meta-analysis methods have become popular to jointly analyze data sets from multiple GWASs. In this paper, we propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the false discovery rate at a certain level. In particular, the Jlfdr-based method achieves higher power than commonly used meta-analysis methods when analyzing heterogeneous data sets from multiple GWASs. Simulation experiments demonstrate the superior power of our method over meta-analysis methods. Also, our method discovers more associations than meta-analysis methods from empirical data sets of four phenotypes. The R-package is available at: http://bioinformatics.ust.hk/Jlfdr.html.

Paper Structure

This paper contains 15 sections, 4 theorems, 24 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

For any rejection region $\mathcal{R}$ with $\text{Fdr}(\mathcal{R})\leq q$, we have $\eta(\mathcal{R}) \leq \eta(\mathcal{R_O})$.

Figures (5)

  • Figure 1: Rejection boundaries determined by different summary-statistics-based joint analysis methods: the optimal method and the meta-analysis method. Assume we jointly analyze data from two GWASs. For simplicity, we assume the tests are one-sided. We plot the test statistic pair $(z^{(1)}, z^{(2)})$ into the coordinate plane. A SNP at the upper right corner shows more significant association than a SNP at the bottom left corner. The true associated SNPs are plotted with blue circles, and the false associated SNPs are plotted with yellow triangles. For each rejection boundary, the SNPs in the upper right region are discovered. All three analysis methods have the same false discovery proportion (10%). The optimal method has more empirical power (red solid line, 72%) than the meta-analysis method (purple dashed line, 36%).
  • Figure 2: (a) The average empirical power and (b) the average Fdp in the homogeneous setting ($\tau=0$) of the simulation experiment. The experiments are repeated 10 times with different sample size ratios ($n^{(2)}/n^{(1)}=0.5$, $1$ and $1.5$). The average Fdp of the three methods (the Jlfdr-based method (Jlfdr), the fixed-effects meta-analysis method (MetaF) and the random-effects meta-analysis method (MetaR)) are about $5\times 10^{-5}$. When controlling Fdr at the same level, the Jlfdr-based method and the fixed-effects meta-analysis method have almost the same average empirical power. The subtle differences are due to random initial choices of the EM-algorithm and the Fdr approximations used in Eq. (\ref{['Fdr']}).
  • Figure 3: The discovered associations in the heterogeneous setting ($\tau=0.5$) of the simulation experiment. Both the first and second studies have $10000$ individuals. For each SNP, the pair of summary statistics $(z^{(1)}, z^{(2)})$ is plotted with transformation $(|z^{(1)}|, sgn(z^{(1)})z^{(2)})$. We use light grey circles to represent the associations discovered by both the Jlfdr-based method and fixed-effects meta-analysis method. We use black upward-pointing triangles and dark grey downward-pointing triangles to represent the associations only discovered by the Jlfdr-based method and the fixed-effects meta-analysis method, respectively. The rejection boundary in the Jlfdr-based method is plotted as the solid curve. The rejection boundary in the fixed-effects meta-analysis method is plotted as the dashed straight line. The Jlfdr-based method discovered more associations overall than the meta-analysis method, although it also misses some associations identified by the meta-analysis method.
  • Figure 4: (a) The average empirical power and (b) the average Fdp in the heterogeneous setting ($\tau=0.5$) of the simulation experiment. We ran experiments 20 times with different sample size ratios ($n^{(2)}/n^{(1)}=0.5$, $1$ and $1.5$). The average Fdp values in three methods are about $5\times 10^{-5}$. When controlling Fdr at the same level, our proposed Jlfdr-based method can achieve higher power than the other methods in every sample size ratio setting.
  • Figure 5: The rejection region determined in the empirical datasets: (a) SCZ data from the PGC; (b) SLE data from dbGaP; (c) BMI data from the GIANT; (d) WHRadjBMI data from the GIANT. The descriptions of the three datasets are presented in the main text. For each SNP, the vector of summary statistics $(z^{(1)}, z^{(2)})$ is plotted with transformation $(|z^{(1)}|, sgn(z^{(1)})z^{(2)})$. We use light grey circles to represent the associations discovered by both the Jlfdr-based method and the fixed-effects meta-analysis method. We use black upward-pointing triangles and dark grey downward-pointing triangles to represent the associations only discovered by the Jlfdr-based method and the fixed-effects meta-analysis method, respectively.

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4