Table of Contents
Fetching ...

Dirichlet process mixtures of block $g$ priors for model selection and prediction in linear models

Anupreet Porwal, Abel Rodriguez

TL;DR

It is shown that Dirichlet process mixtures of block $g$ priors are consistent in various senses and, in particular, that they avoid the conditional Lindley ``paradox'' highlighted by Som et al.(2016).

Abstract

This paper introduces Dirichlet process mixtures of block $g$ priors for model selection and prediction in linear models. These priors are extensions of traditional mixtures of $g$ priors that allow for differential shrinkage for various (data-selected) blocks of parameters while fully accounting for the predictors' correlation structure, providing a bridge between the literatures on model selection and continuous shrinkage priors. We show that Dirichlet process mixtures of block $g$ priors are consistent in various senses and, in particular, that they avoid the conditional Lindley ``paradox'' highlighted by Som et al.(2016). Further, we develop a Markov chain Monte Carlo algorithm for posterior inference that requires only minimal ad-hoc tuning. Finally, we investigate the empirical performance of the prior in various real and simulated datasets. In the presence of a small number of very large effects, Dirichlet process mixtures of block $g$ priors lead to higher power for detecting smaller but significant effects without only a minimal increase in the number of false discoveries.

Dirichlet process mixtures of block $g$ priors for model selection and prediction in linear models

TL;DR

It is shown that Dirichlet process mixtures of block priors are consistent in various senses and, in particular, that they avoid the conditional Lindley ``paradox'' highlighted by Som et al.(2016).

Abstract

This paper introduces Dirichlet process mixtures of block priors for model selection and prediction in linear models. These priors are extensions of traditional mixtures of priors that allow for differential shrinkage for various (data-selected) blocks of parameters while fully accounting for the predictors' correlation structure, providing a bridge between the literatures on model selection and continuous shrinkage priors. We show that Dirichlet process mixtures of block priors are consistent in various senses and, in particular, that they avoid the conditional Lindley ``paradox'' highlighted by Som et al.(2016). Further, we develop a Markov chain Monte Carlo algorithm for posterior inference that requires only minimal ad-hoc tuning. Finally, we investigate the empirical performance of the prior in various real and simulated datasets. In the presence of a small number of very large effects, Dirichlet process mixtures of block priors lead to higher power for detecting smaller but significant effects without only a minimal increase in the number of false discoveries.

Paper Structure

This paper contains 24 sections, 6 theorems, 57 equations, 11 figures, 1 table.

Key Result

Theorem 4.1

Let $\nu_{+}$ be the largest eigenvalue of $\boldsymbol{X}_{\boldsymbol{\gamma}}^T\boldsymbol{X}_{\boldsymbol{\gamma}}$ and $\lambda_{-}(\boldsymbol{G}_{\boldsymbol{\gamma}})$ be the smallest eigenvalue of $\boldsymbol{X}_{\boldsymbol{\gamma}}^T \boldsymbol{X}_{\boldsymbol{\gamma}} - \left[ \{ \bo for all $p_{\boldsymbol{\gamma}} \le p$.

Figures (11)

  • Figure 1: Empirical illustration of the conditional Lindley paradox under hyper-$g/n$ prior. Thin grey lines correspond to 100 simulated datasets, while the thick blue line corresponds to the average.
  • Figure 2: Scatterplots of random samples from the Dirchlet mixture of block $g$ priors and some related distributions in the bivariate case under a hyper-$g/n$ distributon for the shrinkage parameter(s). Panel (a) corresponds to the (elliptical) contours of the standard $g$ prior of liang2008mixtures. Panel (b) shows the density of the prior proposed by som2016conditional, which assumes that blocks are orthogonal a priori. Panel (c) corresponds to a global-local $g$ prior in which each covariate is assigned is own independent shrinkage factor and the prior covariance matrix is proportional to $\left( \boldsymbol{X}^T\boldsymbol{X}\right)^{-1}$. Panel (d) is our DP mixture of block $g$ priors, which in this case corresponds to a mixture of the distributions in panels (a) and (c).
  • Figure 3: Behavior of $\log\left( B_{a,0}(\boldsymbol{y})\right)$ (left column) and $\mathsf{Pr}(\xi_1 \ne \xi_2 \mid \boldsymbol{y})$ (right column) under the DP mixture of block $g$ priors in our first simulation study. Each thin grey line corresponds to one replicate of the simulation, while the thicker blue line corresponds to the mean curve. Figures in the top row correspond to design matrices generated under $\eta=0$, while the bottom row corresponds to $\eta = 0.5$
  • Figure 4: $F_1$ scores for model selection procedures based on various priors for our second simulation study.
  • Figure 5: Prediction MSE for $\eta = 0$ and $\eta=0.5$.
  • ...and 6 more figures

Theorems & Definitions (7)

  • Theorem 4.1
  • Theorem 4.2
  • Definition 4.1: Refinement of a partition
  • Theorem 4.3
  • Theorem 4.4
  • Theorem 4.5
  • Theorem 4.6