Table of Contents
Fetching ...

Power of Knockoff: The Impact of Ranking Algorithm, Augmented Design, and Symmetric Statistic

Zheng Tracy Ke, Jun S. Liu, Yucong Ma

TL;DR

The power of knockoff with its propotype is compared - a method that uses the same ranking algorithm but has access to an ideal threshold, and the comparison reveals the additional price one pays by finding a data-driven threshold to control FDR.

Abstract

The knockoff filter is a recent false discovery rate (FDR) control method for high-dimensional linear models. We point out that knockoff has three key components: ranking algorithm, augmented design, and symmetric statistic, and each component admits multiple choices. By considering various combinations of the three components, we obtain a collection of variants of knockoff. All these variants guarantee finite-sample FDR control, and our goal is to compare their power. We assume a Rare and Weak signal model on regression coefficients and compare the power of different variants of knockoff by deriving explicit formulas of false positive rate and false negative rate. Our results provide new insights on how to improve power when controlling FDR at a targeted level. We also compare the power of knockoff with its propotype - a method that uses the same ranking algorithm but has access to an ideal threshold. The comparison reveals the additional price one pays by finding a data-driven threshold to control FDR.

Power of Knockoff: The Impact of Ranking Algorithm, Augmented Design, and Symmetric Statistic

TL;DR

The power of knockoff with its propotype is compared - a method that uses the same ranking algorithm but has access to an ideal threshold, and the comparison reveals the additional price one pays by finding a data-driven threshold to control FDR.

Abstract

The knockoff filter is a recent false discovery rate (FDR) control method for high-dimensional linear models. We point out that knockoff has three key components: ranking algorithm, augmented design, and symmetric statistic, and each component admits multiple choices. By considering various combinations of the three components, we obtain a collection of variants of knockoff. All these variants guarantee finite-sample FDR control, and our goal is to compare their power. We assume a Rare and Weak signal model on regression coefficients and compare the power of different variants of knockoff by deriving explicit formulas of false positive rate and false negative rate. Our results provide new insights on how to improve power when controlling FDR at a targeted level. We also compare the power of knockoff with its propotype - a method that uses the same ranking algorithm but has access to an ideal threshold. The comparison reveals the additional price one pays by finding a data-driven threshold to control FDR.

Paper Structure

This paper contains 33 sections, 18 theorems, 181 equations, 13 figures, 2 tables.

Key Result

Proposition 3.1

Suppose $X'X=I_p$ and consider the importance metric in BH. When $r>\vartheta$, the FDR-TPR trade-off diagram is given by $g_{\mathrm{FDR}}(u;\vartheta,r)=(u-\vartheta)_+$ and $g_{\mathrm{TPR}}(u;\vartheta,r)=(\sqrt{r}-\sqrt{u})_+^2$. The phase diagram is given by $h_{\mathrm{AFR}}(\vartheta)=\varth

Figures (13)

  • Figure 1: An illustration of the knockoff and its prototype.
  • Figure 2: Left: the FDR-TPR trade-off diagram for a few values of $(\vartheta,r)$. Right: the phase diagram. The design is orthogonal, and the importance metric is as in \ref{['BH']}. Each FDR-TPR trade-off diagram corresponds to one point in the phase diagram.
  • Figure 3: Power comparison of knockoff with different symmetric statistics (orthogonal design; ranking algorithm is Lasso, and augmented design is such that $\mathrm{diag}(s)=I_p$). The left two panels are the phase diagrams of knockoff-diff (left) and knockoff-sgm (middle), where the dashed lines are the phase curves of their common prototype. The right panel is the FDR-TPR trade-off diagram of knockoff-sgm, where each trade-off curve corresponds to one point in the phase diagram.
  • Figure 4: The rejection region of the symmetric statistics (orthogonal design, $a=0$ in the construction of knockoff variables). Left: the signed maximum statistic. Middle: the difference statistic. Right: the prototype.
  • Figure 5: The phase diagrams of Lasso-path (block-wise diagonal designs). Left: $\rho=0.5$. Middle: $\rho=-0.5$. Right: zoom-out of the middle panel. In all three panels, the dashed lines are the phase curves for orthogonal designs ($\rho=0$), as a reference.
  • ...and 8 more figures

Theorems & Definitions (21)

  • Definition 3.1: Multi-$\log(p)$ term
  • Definition 3.2: FDR-TPR trade-off diagram
  • Definition 3.3: Phase diagram
  • Proposition 3.1
  • Theorem 4.1
  • Corollary 4.1
  • Proposition 5.1
  • Theorem 5.1
  • Corollary 5.1: Phase diagram of Lasso-path
  • Theorem 5.2: The case of $|\rho|\geq 1/2$
  • ...and 11 more