Table of Contents
Fetching ...

Hamiltonian Monte Carlo Inference of Marginalized Linear Mixed-Effects Models

Jinlin Lai, Justin Domke, Daniel Sheldon

TL;DR

This work presents a principled method to analytically marginalize random effects in linear mixed-effects models within Hamiltonian Monte Carlo, leveraging fast linear-algebra techniques (matrix inversion and determinant lemmas) and the tree-structured design matrices to avoid cubic-time computations. By reducing the latent dimensionality and preserving exploitable structure, the approach substantially improves sampling efficiency and runtime across diverse LMMs, especially in cognitive-science datasets, and can be implemented in probabilistic programming frameworks like NumPyro. The authors extend the method to multiple effects under scaled-identity covariance assumptions with feasible preprocessing and per-sample costs, compare marginalization to non-centered parameterizations, and demonstrate substantial practical gains through extensive experiments including cross-effects, vectorization benefits, and cognitive-science applications. The results suggest that marginalizing applicable random effects should be standard practice in Bayesian LMM inference, with broad implications for PPL workflows and scalable hierarchical modeling.

Abstract

Bayesian reasoning in linear mixed-effects models (LMMs) is challenging and often requires advanced sampling techniques like Markov chain Monte Carlo (MCMC). A common approach is to write the model in a probabilistic programming language and then sample via Hamiltonian Monte Carlo (HMC). However, there are many ways a user can transform a model that make inference more or less efficient. In particular, marginalizing some variables can greatly improve inference but is difficult for users to do manually. We develop an algorithm to easily marginalize random effects in LMMs. A naive approach introduces cubic time operations within an inference algorithm like HMC, but we reduce the running time to linear using fast linear algebra techniques. We show that marginalization is always beneficial when applicable and highlight improvements in various models, especially ones from cognitive sciences.

Hamiltonian Monte Carlo Inference of Marginalized Linear Mixed-Effects Models

TL;DR

This work presents a principled method to analytically marginalize random effects in linear mixed-effects models within Hamiltonian Monte Carlo, leveraging fast linear-algebra techniques (matrix inversion and determinant lemmas) and the tree-structured design matrices to avoid cubic-time computations. By reducing the latent dimensionality and preserving exploitable structure, the approach substantially improves sampling efficiency and runtime across diverse LMMs, especially in cognitive-science datasets, and can be implemented in probabilistic programming frameworks like NumPyro. The authors extend the method to multiple effects under scaled-identity covariance assumptions with feasible preprocessing and per-sample costs, compare marginalization to non-centered parameterizations, and demonstrate substantial practical gains through extensive experiments including cross-effects, vectorization benefits, and cognitive-science applications. The results suggest that marginalizing applicable random effects should be standard practice in Bayesian LMM inference, with broad implications for PPL workflows and scalable hierarchical modeling.

Abstract

Bayesian reasoning in linear mixed-effects models (LMMs) is challenging and often requires advanced sampling techniques like Markov chain Monte Carlo (MCMC). A common approach is to write the model in a probabilistic programming language and then sample via Hamiltonian Monte Carlo (HMC). However, there are many ways a user can transform a model that make inference more or less efficient. In particular, marginalizing some variables can greatly improve inference but is difficult for users to do manually. We develop an algorithm to easily marginalize random effects in LMMs. A naive approach introduces cubic time operations within an inference algorithm like HMC, but we reduce the running time to linear using fast linear algebra techniques. We show that marginalization is always beneficial when applicable and highlight improvements in various models, especially ones from cognitive sciences.

Paper Structure

This paper contains 32 sections, 4 theorems, 43 equations, 6 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

If $\mathbf{\Sigma_\mathbf{y}}$ is diagonal, $\mathbf{\Sigma_\mathbf{u}}$ is block-diagonal with blocks of size $d\times d$, then $\mathbf{F}=\mathbf{\Sigma_\mathbf{u}^{-1}}+\mathbf{A}^T\mathbf{\Sigma_\mathbf{y}^{-1}}\mathbf{A}$ is also block-diagonal with $d\times d$ blocks and computing $\mathbf{A

Figures (6)

  • Figure 1: A tree-structured model conditioned on $\mathbf{\Theta}$.
  • Figure 2: Average ESS for each variable on the instruction evaluation model with different HMC strategies. Numbers above the sample size 100,000 indicate effective sampling.
  • Figure 3: Distribution of 10,000 samples for variable pairs $(\sigma_1, u_{1,1})$ and $(\sigma_2, u_{2,61})$ on the grouseticks model with different methods. We use M1 to represent marginalizing $\mathbf{u}_1$, M2 to represent marginalizing $\mathbf{u}_2$, R1 to represent reparameterizing $\mathbf{u}_1$, R2 to represent reparameterizing $\mathbf{u}_2$. The number of divergences for each case are reported, with locations shown as red dots. We choose $u_{2,61}$ to demonstrate the distribution of divergences when reparameterizing $\mathbf{u}_2$.
  • Figure 4: Experimental results for the 9 cognitive science datasets with and without marginalization. Each experiment is performed 5 times with different random seeds. Marginalization usually improves sampling speed measured by iterations per second (iter/s) and sample efficiency measured by ESS per iteration (ESS/iter).
  • Figure 5: A tree-structured model conditioned on $\mathbf{\Theta}$.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Theorem 1
  • proof
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Theorem 2
  • proof