Table of Contents
Fetching ...

Bayesian calculus and predictive characterizations of extended feature allocation models

Mario Beraha, Federico Camerlenghi, Lorenzo Ghilotti

TL;DR

This work develops a cohesive Bayesian framework for extended feature allocation models that permit interactions among features and weight dependencies, grounding the analysis in point-process theory and Palm calculus. It derives closed-form marginal, posterior, and predictive distributions without restricting the prior to acrm forms, and introduces two predictive-sufficientness postulates: one where unseen features depend only on the sample size $n$ (characterized by Poisson priors) and another where dependence also involves the number of observed features $k$ (via mixed Poisson/binomial priors). The paper specializes to notable priors, including Poisson, mixed Poisson, mixed binomial, and a novel determinantal point process prior, the latter enabling predictive dependence on observed feature labels and capturing repulsion among features. A key byproduct is a new Palm-calculus-based characterization of the Poisson process and tractable posterior forms for extended models, with practical demonstrations in spatial statistics, notably unseen-forest size estimation and localization of unseen trees. Overall, the framework provides principled guidance for prior elicitation in feature allocations and expands the toolkit for Bayesian nonparametrics by integrating predictive characterizations with interacting feature structures.

Abstract

We introduce and study a unified Bayesian framework for extended feature allocations which flexibly captures interactions -- such as repulsion or attraction -- among features and their associated weights. We provide a complete Bayesian analysis of the proposed model and specialize our general theory to noteworthy classes of priors. This includes a novel prior based on determinantal point processes, for which we show promising results in a spatial statistics application. Within the general class of extended feature allocations, we further characterize those priors that yield predictive probabilities of discovering new features depending either solely on the sample size or on both the sample size and the distinct number of observed features. These predictive characterizations, known as "sufficientness" postulates, have been extensively studied in the literature on species sampling models starting from the seminal contribution of the English philosopher W.E. Johnson for the Dirichlet distribution. Within the feature allocation setting, existing predictive characterizations are limited to very specific examples; in contrast, our results are general, providing practical guidance for prior selection. Additionally, our approach, based on Palm calculus, is analytical in nature and yields a novel characterization of the Poisson point process through its reduced Palm kernel.

Bayesian calculus and predictive characterizations of extended feature allocation models

TL;DR

This work develops a cohesive Bayesian framework for extended feature allocation models that permit interactions among features and weight dependencies, grounding the analysis in point-process theory and Palm calculus. It derives closed-form marginal, posterior, and predictive distributions without restricting the prior to acrm forms, and introduces two predictive-sufficientness postulates: one where unseen features depend only on the sample size (characterized by Poisson priors) and another where dependence also involves the number of observed features (via mixed Poisson/binomial priors). The paper specializes to notable priors, including Poisson, mixed Poisson, mixed binomial, and a novel determinantal point process prior, the latter enabling predictive dependence on observed feature labels and capturing repulsion among features. A key byproduct is a new Palm-calculus-based characterization of the Poisson process and tractable posterior forms for extended models, with practical demonstrations in spatial statistics, notably unseen-forest size estimation and localization of unseen trees. Overall, the framework provides principled guidance for prior elicitation in feature allocations and expands the toolkit for Bayesian nonparametrics by integrating predictive characterizations with interacting feature structures.

Abstract

We introduce and study a unified Bayesian framework for extended feature allocations which flexibly captures interactions -- such as repulsion or attraction -- among features and their associated weights. We provide a complete Bayesian analysis of the proposed model and specialize our general theory to noteworthy classes of priors. This includes a novel prior based on determinantal point processes, for which we show promising results in a spatial statistics application. Within the general class of extended feature allocations, we further characterize those priors that yield predictive probabilities of discovering new features depending either solely on the sample size or on both the sample size and the distinct number of observed features. These predictive characterizations, known as "sufficientness" postulates, have been extensively studied in the literature on species sampling models starting from the seminal contribution of the English philosopher W.E. Johnson for the Dirichlet distribution. Within the feature allocation setting, existing predictive characterizations are limited to very specific examples; in contrast, our results are general, providing practical guidance for prior selection. Additionally, our approach, based on Palm calculus, is analytical in nature and yields a novel characterization of the Poisson point process through its reduced Palm kernel.

Paper Structure

This paper contains 40 sections, 23 theorems, 84 equations, 7 figures.

Key Result

Theorem 1

Let $\bm{Z}$ be a sample from the statistical model eq:representation_theorem, where $\mu$ is the functional of a point process $\Psi$ defined via eq:mu_definition. The probability that $\bm{Z}$ displays $k$ features labeled $\bm x^* = (x^*_1, \ldots, x^*_k)$ with corresponding vector of frequency c where $\rho^{(k)}(\mathrm d \bm s \,|\, \bm x)$ and $\tilde{m}_{\xi}^{(k)}(\mathrm d \bm x)$ are de

Figures (7)

  • Figure 1: Posterior distribution of the total number of trees in the synthetic scenario of \ref{['sec:synthetic']}. From left to right: calculations performed using Le Cam's approximation of the Poisson-binomial, exact computations, and posterior of $\xi^!_{\bm x^*}(\mathbb{X})$. Different line colors correspond to different sample sizes; the black vertical line indicates the true number of trees.
  • Figure 2: Locating the unobserved trees for $n = 15$ in the synthetic scenario of \ref{['sec:synthetic']}: infinitesimal probability of observing an unseen tree in a given location. Left plot: the mean measure of $\xi^\prime$. Right plot: the mean measure of $\xi^!_{\bm x^*}$. The red dots represent the observed trees in the sample. The black crosses indicate the unseen trees. Note that the color scales of the two plots are different.
  • Figure 3: Locating the unobserved trees for $n \in \{10,20,30\}$ in the analysis of the spruces dataset of \ref{['sec:spruces']}: infinitesimal probability of observing an unseen tree in a given location. The three plots report $M_{\xi^\prime}$ for the three sample sizes. The red dots represent the observed trees in the sample. The black crosses indicate the unseen trees. Note that the plots have different color scales.
  • Figure S1: Posterior distribution of the total number of trees. From left to right: calculations performed using Le Cam's approximation of the Poisson-binomial, exact computations, and posterior of $\xi^!_{\bm x^*}(\mathbb{X})$. Different line colors correspond to different sample sizes; the black vertical line indicates the true number of trees.
  • Figure S2: Locating the unobserved trees for $n = 15$: infinitesimal probability of observing an unseen tree in a given location. Left plot: the mean measure of $\xi^\prime$. Right plot: the mean measure of $\xi^!_{\bm x^*}$. The red dots represent the observed trees in the sample. The black crosses indicate the unseen trees. Note that the color scales of the two plots are different.
  • ...and 2 more figures

Theorems & Definitions (35)

  • Example 1: Poisson and mixed Poisson processes
  • Example 2: Binomial and mixed binomial processes
  • Remark 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4: Sufficientness postulate for the dependence on $n$
  • Lemma 1
  • Theorem 5: Sufficientness postulate for the dependence on $n$ and $k$
  • Lemma 2
  • ...and 25 more