Table of Contents
Fetching ...

Maximum likelihood estimation for spinal-structured trees

Romain Azaïs, Benoît Henry

TL;DR

The paper addresses the challenge of estimating birth distributions in spinal-structured multi-type Galton-Watson trees with unobserved types, introducing a two-type model with a spine governed by a latent function $f$ and a normal distribution $\mu$. It develops a maximum-likelihood-like approach within the SST framework, employing an 'Ugly Duckling' spine selector to identify the spine and then corrects $\mu$ to estimate $f$, proving almost sure convergence under the condition $\log m(\mu)-\mathfrak{D}(\mathcal{B}\mu,\nu)<0$. Theoretical contributions include rate-function analysis for large deviations in spine-sample selection and continuity results, extending identifiability across subcritical, critical, and supercritical growth regimes. The paper also provides extensive simulations validating consistency and proposes an asymptotic test to distinguish populations conditioned on survival from those that are not, with practical implications for inference in conditioned branching processes.

Abstract

We investigate some aspects of the problem of the estimation of birth distributions (BD) in multi-type Galton-Watson trees (MGW) with unobserved types. More precisely, we consider two-type MGW called spinal-structured trees. This kind of tree is characterized by a spine of special individuals whose BD $ν$ is different from the other individuals in the tree (called normal whose BD is denoted $μ$). In this work, we show that even in such a very structured two-type population, our ability to distinguish the two types and estimate $μ$ and $ν$ is constrained by a trade-off between the growth-rate of the population and the similarity of $μ$ and $ν$. Indeed, if the growth-rate is too large, large deviations events are likely to be observed in the sampling of the normal individuals preventing us to distinguish them from special ones. Roughly speaking, our approach succeeds if $r<\mathfrak{D}(μ,ν)$ where $r$ is the exponential growth-rate of the population and $\mathfrak{D}$ is a divergence measuring the dissimilarity between $μ$ and $ν$.

Maximum likelihood estimation for spinal-structured trees

TL;DR

The paper addresses the challenge of estimating birth distributions in spinal-structured multi-type Galton-Watson trees with unobserved types, introducing a two-type model with a spine governed by a latent function and a normal distribution . It develops a maximum-likelihood-like approach within the SST framework, employing an 'Ugly Duckling' spine selector to identify the spine and then corrects to estimate , proving almost sure convergence under the condition . Theoretical contributions include rate-function analysis for large deviations in spine-sample selection and continuity results, extending identifiability across subcritical, critical, and supercritical growth regimes. The paper also provides extensive simulations validating consistency and proposes an asymptotic test to distinguish populations conditioned on survival from those that are not, with practical implications for inference in conditioned branching processes.

Abstract

We investigate some aspects of the problem of the estimation of birth distributions (BD) in multi-type Galton-Watson trees (MGW) with unobserved types. More precisely, we consider two-type MGW called spinal-structured trees. This kind of tree is characterized by a spine of special individuals whose BD is different from the other individuals in the tree (called normal whose BD is denoted ). In this work, we show that even in such a very structured two-type population, our ability to distinguish the two types and estimate and is constrained by a trade-off between the growth-rate of the population and the similarity of and . Indeed, if the growth-rate is too large, large deviations events are likely to be observed in the sampling of the normal individuals preventing us to distinguish them from special ones. Roughly speaking, our approach succeeds if where is the exponential growth-rate of the population and is a divergence measuring the dissimilarity between and .

Paper Structure

This paper contains 22 sections, 8 theorems, 72 equations, 4 figures, 2 tables.

Key Result

Proposition 2.1

Let $T$ be a spinal-structured tree observed until generation $h$ and $v$ an observed node of $T$.

Figures (4)

  • Figure 1: A spinal-structured tree simulated until generation $30$ with normal nodes in blue and special nodes in red (left). We assume that it is observed until generation $h=15$ and identify the type of the nodes using Proposition \ref{['prop:id:spine']} (right) with the following color code: light blue for identified normal nodes, light red for identified special nodes, gray for unobserved nodes, and white for unidentified types.
  • Figure 2: Average error as a function of the maximum height observed in the estimation of the unknown parameters (orange and red: $\mu$, blue: $f$, light blue: $\nu$, and green: $\mathcal{S}$) of a spinal-structured tree in the 3 growth regimes (top left: subcritical, top right: critical, and bottom left: supercritical). Parameter values can be read in Tab. \ref{['tab:parameters']}.
  • Figure 3: Estimates of $h\,\mathbb{V}\text{ar}(\widehat{f}_{h}(1)-f(1))$ from samples of $100$ spinal-structured trees simulated with various parameters $\mu$ and $f$ with $0.5<m(\mu)<1$ and $0<f(1)<0.5$ (left) and empirical distribution of the reduced centered error $\sqrt{h}(\widehat{f}_{h}(1)-f(1))/\sigma(\widehat{\mu}_{h},\widehat{f}_{h})$ for some of these parameters (right) with a comparison to the Gaussian distribution (thick black line).
  • Figure 4: Empirical distribution of the test statistic $Q_h$ obtained from samples of $100$ Kesten's trees with normal birth distribution $(0.55,0.2,0.25)$ (left) and from samples of $100$ Galton-Watson trees with competition (right), both with a comparison to the $\chi^2(1)$ distribution (red line).

Theorems & Definitions (10)

  • Proposition 2.1
  • Theorem 3.1
  • Remark 3.1
  • Theorem 3.2
  • Theorem 5.1
  • Lemma 5.1
  • Remark 5.1
  • Proposition 6.1
  • Lemma 6.1
  • Lemma 6.2