Table of Contents
Fetching ...

A capture-recapture hidden Markov model framework for register-based inference of population size and dynamics

Lucy Y Brown, Eleni Matechou, Bruno Santos, Eleonora Mussino

Abstract

Accurate inference on population dynamics, such as migration and changes in population size, is essential for policymaking, resource allocation and demographic research. Traditional censuses are expensive, infrequent and not timely, leading many countries to adopt register-based approaches to replace or complement them. A primary challenge is that such registers are incomplete: even when individuals are present, their activities may not generate records in specific registers, resulting in false negative observation error. Conversely, some registers arise from administrative or household-level processes, so that individuals may appear in registers despite being absent, leading to false positive observation error. Existing approaches often either rely on ad-hoc decisions that ignore one or both error types, offer inference on population snapshots but not dynamics, or are computationally too slow for practical use. We propose a scalable framework for inferring population size and dynamics from register data, building on Cormack-Jolly-Seber type capture-recapture models formulated as hidden Markov models. Inference is carried out using maximum likelihood estimation, with uncertainty quantified via the Bag of Little Bootstraps. The model accounts for temporary emigration, incorporates an arbitrary number of possibly interacting registers subject to both error types, and allows observation probabilities to vary with individual characteristics and unobservable heterogeneity. We illustrate the approach using Swedish population registers, where overcoverage - individuals registered as living in the country although they are no longer present - provides a motivating example. The application yields new insights into population dynamics and individual trajectories.

A capture-recapture hidden Markov model framework for register-based inference of population size and dynamics

Abstract

Accurate inference on population dynamics, such as migration and changes in population size, is essential for policymaking, resource allocation and demographic research. Traditional censuses are expensive, infrequent and not timely, leading many countries to adopt register-based approaches to replace or complement them. A primary challenge is that such registers are incomplete: even when individuals are present, their activities may not generate records in specific registers, resulting in false negative observation error. Conversely, some registers arise from administrative or household-level processes, so that individuals may appear in registers despite being absent, leading to false positive observation error. Existing approaches often either rely on ad-hoc decisions that ignore one or both error types, offer inference on population snapshots but not dynamics, or are computationally too slow for practical use. We propose a scalable framework for inferring population size and dynamics from register data, building on Cormack-Jolly-Seber type capture-recapture models formulated as hidden Markov models. Inference is carried out using maximum likelihood estimation, with uncertainty quantified via the Bag of Little Bootstraps. The model accounts for temporary emigration, incorporates an arbitrary number of possibly interacting registers subject to both error types, and allows observation probabilities to vary with individual characteristics and unobservable heterogeneity. We illustrate the approach using Swedish population registers, where overcoverage - individuals registered as living in the country although they are no longer present - provides a motivating example. The application yields new insights into population dynamics and individual trajectories.

Paper Structure

This paper contains 22 sections, 11 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Estimated coefficients for life‑event probabilities with 95% confidence intervals for each covariate category. Numerical values are provided in the supplementary material.
  • Figure 2: Observation probability estimates for each register, broken down into FMM Group 1 (high job income observation probability) and Group 2 (low job income observation probability), with $95\%$ confidence intervals.
  • Figure 3: Proportion of registered individuals in $2016$ assigned to FMM groups 1 and 2 consistently ($\geq 90\%$ of bootstraps) and inconsistently, decomposed by sex, age and time since first entering Sweden (TIS).
  • Figure 4: Estimated probability of true presence for individuals observed only in the family income register, as a function of consecutive years of such observations. Panel A shows results by sex and panel B by country of birth. Shaded bands show $95\%$ confidence intervals.
  • Figure 5: Overcoverage estimates over time for the full proposed model and three reduced variants obtained by removing the FMM, false positive observation error modelling, or both. Shaded bands show $95\%$ confidence intervals.