A Hitchhiker's Guide to Poisson Gradient Estimation
Michael Ibrahim, Hanqi Zhao, Eli Sennesh, Zhi Li, Anqi Wu, Jacob L. Yates, Chengrui Li, Hadi Vafaii
TL;DR
This work tackles the challenge of differentiating through Poisson-distributed latent variables, a common scenario in neuroscience-inspired models. It compares Eat and Gsm relaxations and introduces Eat_cubic, a cubic Hermite-based relaxation with compact support that yields unbiased first moments for τ ≤ 1 and improved distributional fidelity. Leveraging Campbell's theorem, the authors derive closed-form expressions for the Eat moments and demonstrate through theory and experiments that Eat_cubic better preserves Poisson statistics (especially mean and variance) while remaining robust to temperature choices; it often matches exact gradients in downstream ELBO performance on Poisson VAE and POGLM tasks. They also provide a nuanced gradient analysis showing that distributional fidelity and gradient quality capture complementary facets of relaxation performance, and conclude with practical recommendations for practitioners on choosing and tuning Poisson relaxations. Overall, the results highlight distributional fidelity as a crucial factor and offer a robust, temperature-insensitive option for Poisson gradient estimation with broad NeuroAI applicability.
Abstract
Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: Exponential Arrival Time (EAT) simulation and Gumbel-SoftMax (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical guidance for practitioners. Our main technical contribution is a modification to the EAT method that theoretically guarantees an unbiased first moment (exactly matching the firing rate), and reduces second-moment bias. We evaluate these methods on their distributional fidelity, gradient quality, and performance on two tasks: (1) variational autoencoders with Poisson latents, and (2) partially observable generalized linear models, where latent neural connectivity must be inferred from observed spike trains. Across all metrics, our modified EAT method exhibits better overall performance (often comparable to exact gradients), and substantially higher robustness to hyperparameter choices. Together, our results clarify the trade-offs between these methods and offer concrete recommendations for practitioners working with Poisson latent variable models.
