Conjugate gradient methods for high-dimensional GLMMs
Andrea Pandolfi, Omiros Papaspiliopoulos, Giacomo Zanella
TL;DR
This work addresses the computational bottleneck in fitting high-dimensional GLMMs, where posterior precision ${\boldsymbol{Q}}$ combines a diagonal prior with a dense likelihood contribution ${\boldsymbol{V}}^{T} {\boldsymbol{\Omega}} {\boldsymbol{V}}$. It shows that exact Cholesky factorization becomes impractical for typical crossed designs because of fill-ins, and proposes a conjugate gradient (CG) sampler that solves ${\boldsymbol{Q}}{\boldsymbol{\theta}}={\boldsymbol{b}}$ to draw approximate Gaussian samples, with a total cost that scales linearly in $N$ and $p$ under realistic spectra. The paper develops a detailed spectral analysis, including Jacobi preconditioning and connections to the adjacency structure of a multi-partite graph, to prove that CG converges in a dimension-free number of iterations for many designs, while identifying graph-structural conditions under which convergence deteriorates. Numerical experiments on real and large-scale simulated data (including voter turnout, instructor evaluations, and MovieLens-25M) confirm the theory: CG often matches or outperforms Cholesky in cost and scales to millions of observations, though nested designs can slow convergence. The results provide principled guidelines on when CG-based sampling is advantageous and offer a framework tying linear-algebraic performance to the underlying graph structure of crossed random effects.
Abstract
Generalized linear mixed models (GLMMs) are a widely used tool in statistical analysis. The main bottleneck of many computational approaches lies in the inversion of the high dimensional precision matrices associated with the random effects. Such matrices are typically sparse; however, the sparsity pattern resembles a multi partite random graph, which does not lend itself well to default sparse linear algebra techniques. Notably, we show that, for typical GLMMs, the Cholesky factor is dense even when the original precision is sparse. We thus turn to approximate iterative techniques, in particular to the conjugate gradient (CG) method. We combine a detailed analysis of the spectrum of said precision matrices with results from random graph theory to show that CG-based methods applied to high-dimensional GLMMs typically achieve a fixed approximation error with a total cost that scales linearly with the number of parameters and observations. Numerical illustrations with both real and simulated data confirm the theoretical findings, while at the same time illustrating situations, such as nested structures, where CG-based methods struggle.
