Table of Contents
Fetching ...

The evolving categories multinomial distribution: introduction with applications to movement ecology and vote transfer

Ricardo Carrizo Vergara, Marc Kéry, Trevor Hefley

TL;DR

The paper introduces the evolving categories multinomial (ECM) distribution for multivariate counts observed over time with evolving category structures, plus the ECM-Poisson variant for unknown total abundance $N$. It develops two practical inference strategies—Gaussian pseudo-likelihood (MGLE) and pairwise composite likelihood (MCLE)—that rely only on first- and second-order moment information, enabling estimation when full likelihoods are intractable. The ECM framework is demonstrated in movement ecology (space-time counts and OU movement, and translocated lesser prairie-chickens) and sociological/ecological inference (vote transfer in Chile's 2021 election), including a Chilean case study using MGLE. Key findings show consistent estimation as the number of individuals or rate $\lambda$ grows, that MCLE often outperforms MGLE at smaller sample sizes, and that the ECM structure provides a principled link between movement, counts, and evolving categories with applicability beyond the illustrated domains.

Abstract

We introduce the evolving categories multinomial (ECM) distribution for multivariate count data taken over time. This distribution models the counts of individuals following iid stochastic dynamics among categories, with the number and identity of the categories also evolving over time. We specify the one-time and two-times marginal distributions of the counts and the first and second order moments. When the total number of individuals is unknown, placing a Poisson prior on it yields a new distribution (ECM-Poisson), whose main properties we also describe. Since likelihoods are intractable or impractical, we propose two estimating functions for parameter estimation: a Gaussian pseudo-likelihood and a pairwise composite likelihood. We show two application scenarios: the inference of movement parameters of animals moving continuously in space-time with irregular survey regions, and the inference of vote transfer in two-rounds elections. We give three illustrations: a simulation study with Ornstein-Uhlenbeck moving individuals, paying special attention to the autocorrelation parameter; the inference of movement and behavior parameters of lesser prairie-chickens; and the estimation of vote transfer in the 2021 Chilean presidential election.

The evolving categories multinomial distribution: introduction with applications to movement ecology and vote transfer

TL;DR

The paper introduces the evolving categories multinomial (ECM) distribution for multivariate counts observed over time with evolving category structures, plus the ECM-Poisson variant for unknown total abundance . It develops two practical inference strategies—Gaussian pseudo-likelihood (MGLE) and pairwise composite likelihood (MCLE)—that rely only on first- and second-order moment information, enabling estimation when full likelihoods are intractable. The ECM framework is demonstrated in movement ecology (space-time counts and OU movement, and translocated lesser prairie-chickens) and sociological/ecological inference (vote transfer in Chile's 2021 election), including a Chilean case study using MGLE. Key findings show consistent estimation as the number of individuals or rate grows, that MCLE often outperforms MGLE at smaller sample sizes, and that the ECM structure provides a principled link between movement, counts, and evolving categories with applicability beyond the illustrated domains.

Abstract

We introduce the evolving categories multinomial (ECM) distribution for multivariate count data taken over time. This distribution models the counts of individuals following iid stochastic dynamics among categories, with the number and identity of the categories also evolving over time. We specify the one-time and two-times marginal distributions of the counts and the first and second order moments. When the total number of individuals is unknown, placing a Poisson prior on it yields a new distribution (ECM-Poisson), whose main properties we also describe. Since likelihoods are intractable or impractical, we propose two estimating functions for parameter estimation: a Gaussian pseudo-likelihood and a pairwise composite likelihood. We show two application scenarios: the inference of movement parameters of animals moving continuously in space-time with irregular survey regions, and the inference of vote transfer in two-rounds elections. We give three illustrations: a simulation study with Ornstein-Uhlenbeck moving individuals, paying special attention to the autocorrelation parameter; the inference of movement and behavior parameters of lesser prairie-chickens; and the estimation of vote transfer in the 2021 Chilean presidential election.

Paper Structure

This paper contains 28 sections, 10 theorems, 71 equations, 9 figures, 4 tables.

Key Result

Proposition 2.1

The characteristic function $\varphi_{\bm{Q}} : \mathbb{R}^{m_{1} + ... + m_{n}} \to \mathbb{C}$ of $\bm{Q}$ is with $\bm{\xi} = \left( \xi_{l}^{(k)} \right)_{k \in \lbrace 1 , ... , n \rbrace, l \in \lbrace 1 , ... , m_{k} \rbrace} \in \mathbb{R}^{m_{1} + ... + m_{n}}$.For ease of exposition, we consider the arrangements of the form $(q_{l}^{(k)})_{k \in \lbrace 1 , ... , n \rbrace, l \in \lbrac

Figures (9)

  • Figure 1: Estimates of steady-state OU parameters $(\tau,\sigma,z_{1},z_{2})$ in simulation study, with variable quantity of individuals $N$ and size rate $\lambda$ for ECM and ECM-Poisson (ECM-P) simulations respectively. Red colored box-plots correspond to MGLE, blue colored ones to MCLE. True parameter value indicated with a doted gold line. Sample size $1000$ in every setting (except MGLE ECM-P $\lambda = 10^{2}$, see Appendix \ref{['App:DetailsSimu']}).
  • Figure 2: Estimations of size rates $\lambda$ in ECM-Poisson simulation studies with steady-state OU underlying movement. Red colored box-plots correspond to MGLE, blue colored ones to MCLE. True parameter value indicated with a doted gol line. Sample size $1000$ in every setting (except MGLE $\lambda = 10^{2}$, see Appendix \ref{['App:DetailsSimu']}).
  • Figure 3: Lesser prairie-chicken abundance data constructed from telemetry data. Each plot shows the survey sub-squares of size $\Delta x = 0.1$ inside the domain $[-0.6,0.6]^{2}$ where the underlying moving individuals are counted at a given survey time $t_{k}$, $k=1,\ldots , 10$. Inside each sub-square, a number indicates the quantity of individuals there counted.
  • Figure 4: Parametric bootstrap frequency histograms for estimates of $(\tau,\sigma,t_{0},\alpha)$ with $1000$ samples. Point estimates in red, sample mean in blue, $95\%$ confidence interval limits in dotted black.
  • Figure 5: Correlogram of bootstrap samples of estimates of lesser prairie-chicken movement parameters $(\tau,\sigma,t_{0},\alpha)$. In histograms, true value marked with a dotted red, sample mean marked with dotted blue. In point clouds, true pair of values marked with a red point, sample mean marked with an empty-interior blue circle. Sample size of $1000$.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Definition 2.1
  • Proposition 2.1
  • Proposition 2.2
  • Proposition 2.3
  • Proposition 2.4
  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition 3.4
  • Proposition 4.1
  • ...and 1 more