Table of Contents
Fetching ...

Dependent Dirichlet processes via thinning

Laura D'Angelo, Bernardo Nipoti, Andrea Ongaro

TL;DR

A novel framework to model a collection of samples using dependent Dirichlet processes constructed through a thinning mechanism that reduces uncertainty in group-specific inferences while preventing excessive borrowing of information when the data indicate it is unnecessary is introduced.

Abstract

When analyzing data from multiple sources, it is often convenient to strike a careful balance between two goals: capturing the heterogeneity of the samples and sharing information across them. We introduce a novel framework to model a collection of samples using dependent Dirichlet processes constructed through a thinning mechanism. The proposed approach modifies the stick-breaking representation of the Dirichlet process by thinning, that is, setting equal to zero a random subset of the beta random variables used in the original construction. This results in a collection of dependent random distributions that exhibit both shared and unique atoms, with the shared ones assigned distinct weights in each distribution. The generality of the construction allows expressing a wide variety of dependence structures among the elements of the generated random vectors. Moreover, its simplicity facilitates the characterization of several theoretical properties and the derivation of efficient computational methods for posterior inference. A simulation study illustrates how a modeling approach based on the proposed process reduces uncertainty in group-specific inferences while preventing excessive borrowing of information when the data indicate it is unnecessary. This added flexibility improves the accuracy of posterior inference, outperforming related state-of-the-art models. An application to the Collaborative Perinatal Project data highlights the model's capability to estimate group-specific densities and uncover a meaningful partition of the observations, both within and across samples, providing valuable insights into the underlying data structure.

Dependent Dirichlet processes via thinning

TL;DR

A novel framework to model a collection of samples using dependent Dirichlet processes constructed through a thinning mechanism that reduces uncertainty in group-specific inferences while preventing excessive borrowing of information when the data indicate it is unnecessary is introduced.

Abstract

When analyzing data from multiple sources, it is often convenient to strike a careful balance between two goals: capturing the heterogeneity of the samples and sharing information across them. We introduce a novel framework to model a collection of samples using dependent Dirichlet processes constructed through a thinning mechanism. The proposed approach modifies the stick-breaking representation of the Dirichlet process by thinning, that is, setting equal to zero a random subset of the beta random variables used in the original construction. This results in a collection of dependent random distributions that exhibit both shared and unique atoms, with the shared ones assigned distinct weights in each distribution. The generality of the construction allows expressing a wide variety of dependence structures among the elements of the generated random vectors. Moreover, its simplicity facilitates the characterization of several theoretical properties and the derivation of efficient computational methods for posterior inference. A simulation study illustrates how a modeling approach based on the proposed process reduces uncertainty in group-specific inferences while preventing excessive borrowing of information when the data indicate it is unnecessary. This added flexibility improves the accuracy of posterior inference, outperforming related state-of-the-art models. An application to the Collaborative Perinatal Project data highlights the model's capability to estimate group-specific densities and uncover a meaningful partition of the observations, both within and across samples, providing valuable insights into the underlying data structure.

Paper Structure

This paper contains 29 sections, 11 theorems, 85 equations, 17 figures.

Key Result

Proposition 1

If $p_{1:2}\sim\text{thinned-DDP}(\alpha,P_0,\ell_{1:2})$, then, for any $A\in\mathcal{B}$, the correlation between $p_{1}(A)$ and $p_{2}(A)$ is given by where $s_j = \sum_{h=1}^{j-1}\ell_{h,1}\ell_{h,2}$ and $q_j = \sum_{h=1}^{j-1}\mathbb{I}_{\{\ell_{h,1}\neq \ell_{h,2}\}}$ are, respectively, the number of shared and distribution-specific atoms in the first $j-1$ components of $\ell_{1:2}$.

Figures (17)

  • Figure 1: Expected number of shared clusters ($\mathbb{E}[K_0]$, continuous line) and covariate-specific clusters ($\mathbb{E}[K_1]$ and $\mathbb{E}[K_2]$, dashed lines) in two samples of size $n=100$ from $p_1$ and $p_2$, with $p_{1:2}\sim\text{thinned-DDP}(1,P_0,\ell_{1:2})$. Left: Bernoulli thinning, with $\pi$ ranging in $(0,1)$; Right: eventually single-atom Poisson thinning, with $\lambda$ ranging in $(0,30)$.
  • Figure 2: Total expected number of clusters, $\mathbb{E}[K]$, in two samples from $p_1$ and $p_2$, with $p_{1:2}\sim\text{thinned-DDP}(1,P_0,\ell_{1:2})$, as a function of the size $n$ of the two samples. Left: Bernoulli thinning, with $\pi$ ranging in $\{0.1,0.5,0.9,1\}$; Right: eventually single-atom Poisson thinning, with $\lambda$ ranging in $\{0,2,5,10,20\}$. The dotted lines correspond to the theoretical bounds provided by Proposition \ref{['prop:exact_K']}.
  • Figure 3: Boxplot of the average ARI for the thinned-DDP, the complete-pooling, and the no-pooling DP mixture on the simulated data.
  • Figure 4: Boxplot of the TV distance for the thinned-DDP, the complete-pooling, and the no-pooling DP mixture on the simulated data.
  • Figure 5: Boxplot of the average ARI for the thinned-DDP, the CAM, and the GM-DDP mixture on the simulated data.
  • ...and 12 more figures

Theorems & Definitions (12)

  • Definition 1
  • Proposition 1
  • Proposition 2
  • Corollary 1
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Proposition 6
  • Corollary D.1
  • Corollary D.2
  • ...and 2 more