Table of Contents
Fetching ...

Knockoffs for exchangeable categorical covariates

Emanuela Dreassi, Luca Pratelli, Pietro Rigo

Abstract

Let $X=(X_1,\ldots,X_p)$ be a $p$-variate random vector and $F$ a fixed finite set. In a number of applications, mainly in genetics, it turns out that $X_i\in F$ for each $i=1,\ldots,p$. Despite the latter fact, to obtain a knockoff $\widetilde{X}$ (in the sense of \cite{CFJL18}), $X$ is usually modeled as an absolutely continuous random vector. While comprehensible from the point of view of applications, this approximate procedure does not make sense theoretically, since $X$ is supported by the finite set $F^p$. In this paper, explicit formulae for the joint distribution of $(X,\widetilde{X})$ are provided when $P(X\in F^p)=1$ and $X$ is exchangeable or partially exchangeable. In fact, when $X_i\in F$ for all $i$, there seem to be various reasons for assuming $X$ exchangeable or partially exchangeable. The robustness of $\widetilde{X}$, with respect to the de Finetti's measure $π$ of $X$, is investigated as well. Let $\mathcal{L}_π(\widetilde{X}\mid X=x)$ denote the conditional distribution of $\widetilde{X}$, given $X=x$, when the de Finetti's measure is $π$. It is shown that $$\norm{\mathcal{L}_{π_1}(\widetilde{X}\mid X=x)-\mathcal{L}_{π_2}(\widetilde{X}\mid X=x)}\le c(x)\,\norm{π_1-π_2}$$ where $\norm{\cdot}$ is total variation distance and $c(x)$ a suitable constant. Finally, a numerical experiment is performed. Overall, the knockoffs of this paper outperform the alternatives (i.e., the knockoffs obtained by giving $X$ an absolutely continuous distribution) as regards the false discovery rate but are slightly weaker in terms of power.

Knockoffs for exchangeable categorical covariates

Abstract

Let be a -variate random vector and a fixed finite set. In a number of applications, mainly in genetics, it turns out that for each . Despite the latter fact, to obtain a knockoff (in the sense of \cite{CFJL18}), is usually modeled as an absolutely continuous random vector. While comprehensible from the point of view of applications, this approximate procedure does not make sense theoretically, since is supported by the finite set . In this paper, explicit formulae for the joint distribution of are provided when and is exchangeable or partially exchangeable. In fact, when for all , there seem to be various reasons for assuming exchangeable or partially exchangeable. The robustness of , with respect to the de Finetti's measure of , is investigated as well. Let denote the conditional distribution of , given , when the de Finetti's measure is . It is shown that where is total variation distance and a suitable constant. Finally, a numerical experiment is performed. Overall, the knockoffs of this paper outperform the alternatives (i.e., the knockoffs obtained by giving an absolutely continuous distribution) as regards the false discovery rate but are slightly weaker in terms of power.

Paper Structure

This paper contains 16 sections, 6 theorems, 94 equations, 5 figures.

Key Result

Theorem 1

Assume condition ni9x2 and denote by $\widetilde{X}$ any $p$-variate random vector. Then, $\widetilde{X}$ is a knockoff provided for all $x,\,\widetilde{x}\in\{0,1\}^p$, where $\widetilde{n}_j=n_j(\widetilde{x})=\sum_{i=1}^p\textbf{1}(\widetilde{x}_i=j)$.

Figures (5)

  • Figure 1: Simulated data for 2-valued exchangeable covariates. Diffuse priors: Beta distributions with equal parameters
  • Figure 2: Simulated data for exchangeable 2-valued covariates. Discrete priors: Uniform and binomial
  • Figure 3: Real data (HIV-1 dataset) for exchangeable 2-valued covariates. Priors: Beta, binomial and discrete uniform
  • Figure 4: Simulated data for exchangeable 3-valued covariates. Dirichlet priors with equal parameters
  • Figure 5: Simulated data for partially exchangeable 2-valued covariates. Discrete priors with different choices of $f(u)$

Theorems & Definitions (14)

  • Theorem 1
  • proof
  • Remark 2
  • Theorem 3
  • proof
  • Example 4
  • Theorem 5
  • proof
  • Lemma 6
  • proof
  • ...and 4 more