Modeling massive highly-multivariate nonstationary spatial data with the basis graphical lasso
Mitchell Krock, William Kleiber, Dorit Hammerling, Stephen Becker
TL;DR
This paper tackles the challenge of modeling massive, highly-mivariate nonstationary spatial data by combining basis expansions with sparse Gaussian graphical modeling of the basis weights. The authors extend the Basis Graphical Lasso to a multivariate setting, introducing a fusion-regularized penalty across resolution levels to encourage stable, interpretable cross-process dependencies while maintaining computational efficiency through orthogonal bases. They provide an implementation strategy, including initial unfused estimates, maximum likelihood updates via a DC/QUIC framework, error-variance estimation, and cross-validation, and validate the approach on a 40-variable climate ensemble with thousands of spatial locations, revealing meaningful dependency patterns and nonstationarity across scales. The resulting framework offers scalable, interpretable cross-process covariances and cross-covariances for high-dimensional spatial data, with potential extensions to full joint precision estimation and space-time settings.
Abstract
We propose a new modeling framework for highly-multivariate spatial processes that synthesizes ideas from recent multiscale and spectral approaches with graphical models. The basis graphical lasso writes a univariate Gaussian process as a linear combination of basis functions weighted with entries of a Gaussian graphical vector whose graph is estimated from optimizing an $\ell_1$ penalized likelihood. This paper extends the setting to a multivariate Gaussian process where the basis functions are weighted with Gaussian graphical vectors. We motivate a model where the basis functions represent different levels of resolution and the graphical vectors for each level are assumed to be independent. Using an orthogonal basis grants linear complexity and memory usage in the number of spatial locations, the number of basis functions, and the number of realizations. An additional fusion penalty encourages a parsimonious conditional independence structure in the multilevel graphical model. We illustrate our method on a large climate ensemble from the National Center for Atmospheric Research's Community Atmosphere Model that involves 40 spatial processes.
