Composite likelihood inference for the Poisson log-normal model
Julien Stoehr, Stephane S. Robin
TL;DR
This paper tackles parameter inference for the Poisson log-normal model in multivariate count data by marrying EM with composite likelihood and importance sampling (ISEM). By using block-wise composite likelihood and a mixture Gaussian proposal, the method achieves maximum-likelihood-like asymptotics and valid uncertainty quantification for moderately high-dimensional problems, while mitigating the computational bottleneck of high-dimensional latent integration. The authors derive CL-EM updates, establish variance estimation via Godambe information, and provide robust block designs to estimate in-block latent covariances. Empirical results on simulated data and the Barents Sea fish dataset show that CL-ISEM yields reliable inference and competitive, scalable performance compared to variational approaches, with the added advantage of principled standard errors and hypothesis tests.
Abstract
The Poisson log-normal model is a latent variable model that provides a generic framework for the analysis of multivariate count data. Inferring its parameters can be a daunting task since the conditional distribution of the latent variables given the observed ones is intractable. For this model, variational approaches are the golden standard solution as they prove to be computationally efficient but lack theoretical guarantees on the estimates. Sampling-based solutions are quite the opposite. We first define a Monte Carlo EM algorithm that can achieve maximum likelihood estimators, but that is computationally efficient only for low-dimensional latent spaces. We then propose a novel inference procedure combining the EM framework with composite likelihood and importance sampling estimates. The algorithm preserves the desirable asymptotic properties of maximum likelihood estimators while circumventing the high-dimensional integration bottleneck, thus maintaining computational feasibility for moderately large datasets. This approach enables grounded parameter estimation, confidence intervals, and hypothesis testing. Application to the Barents Sea fish dataset demonstrates the algorithm capacity to identify significant environmental effects and residual interspecies correlations.
