Table of Contents
Fetching ...

Modeling massive highly-multivariate nonstationary spatial data with the basis graphical lasso

Mitchell Krock, William Kleiber, Dorit Hammerling, Stephen Becker

TL;DR

This paper tackles the challenge of modeling massive, highly-mivariate nonstationary spatial data by combining basis expansions with sparse Gaussian graphical modeling of the basis weights. The authors extend the Basis Graphical Lasso to a multivariate setting, introducing a fusion-regularized penalty across resolution levels to encourage stable, interpretable cross-process dependencies while maintaining computational efficiency through orthogonal bases. They provide an implementation strategy, including initial unfused estimates, maximum likelihood updates via a DC/QUIC framework, error-variance estimation, and cross-validation, and validate the approach on a 40-variable climate ensemble with thousands of spatial locations, revealing meaningful dependency patterns and nonstationarity across scales. The resulting framework offers scalable, interpretable cross-process covariances and cross-covariances for high-dimensional spatial data, with potential extensions to full joint precision estimation and space-time settings.

Abstract

We propose a new modeling framework for highly-multivariate spatial processes that synthesizes ideas from recent multiscale and spectral approaches with graphical models. The basis graphical lasso writes a univariate Gaussian process as a linear combination of basis functions weighted with entries of a Gaussian graphical vector whose graph is estimated from optimizing an $\ell_1$ penalized likelihood. This paper extends the setting to a multivariate Gaussian process where the basis functions are weighted with Gaussian graphical vectors. We motivate a model where the basis functions represent different levels of resolution and the graphical vectors for each level are assumed to be independent. Using an orthogonal basis grants linear complexity and memory usage in the number of spatial locations, the number of basis functions, and the number of realizations. An additional fusion penalty encourages a parsimonious conditional independence structure in the multilevel graphical model. We illustrate our method on a large climate ensemble from the National Center for Atmospheric Research's Community Atmosphere Model that involves 40 spatial processes.

Modeling massive highly-multivariate nonstationary spatial data with the basis graphical lasso

TL;DR

This paper tackles the challenge of modeling massive, highly-mivariate nonstationary spatial data by combining basis expansions with sparse Gaussian graphical modeling of the basis weights. The authors extend the Basis Graphical Lasso to a multivariate setting, introducing a fusion-regularized penalty across resolution levels to encourage stable, interpretable cross-process dependencies while maintaining computational efficiency through orthogonal bases. They provide an implementation strategy, including initial unfused estimates, maximum likelihood updates via a DC/QUIC framework, error-variance estimation, and cross-validation, and validate the approach on a 40-variable climate ensemble with thousands of spatial locations, revealing meaningful dependency patterns and nonstationarity across scales. The resulting framework offers scalable, interpretable cross-process covariances and cross-covariances for high-dimensional spatial data, with potential extensions to full joint precision estimation and space-time settings.

Abstract

We propose a new modeling framework for highly-multivariate spatial processes that synthesizes ideas from recent multiscale and spectral approaches with graphical models. The basis graphical lasso writes a univariate Gaussian process as a linear combination of basis functions weighted with entries of a Gaussian graphical vector whose graph is estimated from optimizing an penalized likelihood. This paper extends the setting to a multivariate Gaussian process where the basis functions are weighted with Gaussian graphical vectors. We motivate a model where the basis functions represent different levels of resolution and the graphical vectors for each level are assumed to be independent. Using an orthogonal basis grants linear complexity and memory usage in the number of spatial locations, the number of basis functions, and the number of realizations. An additional fusion penalty encourages a parsimonious conditional independence structure in the multilevel graphical model. We illustrate our method on a large climate ensemble from the National Center for Atmospheric Research's Community Atmosphere Model that involves 40 spatial processes.

Paper Structure

This paper contains 13 sections, 22 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: First two pooled EOFs of the standardized CAM data. The two EOFs account for 1.83% and 1.59% of the total variability of all 40 variables, respectively. Rightmost plot shows the cumulative percentage of variability explained by the EOFs, with $L=2000$ EOFs capturing 97.2% of the total variance.
  • Figure 2: Marginal precision estimates (i.e., diagonals of $\hat{\mathbf{Q}}_1,\dots,\hat{\mathbf{Q}}_{2000}$) shown by level for various penalty choices. Log scale on $y$-axis. Bottom row shows a subset of six variables from the top row. The grey pressure variable is noticeably smoother than the rest after regularization is added.
  • Figure 3: Illustrating how graphical model neighborhoods behave for various penalty choices. Left column counts the nonzeros of $\hat{\mathbf{Q}}_1,\dots,\hat{\mathbf{Q}}_{2000}$ by level. Center and right columns show how the neighbors of BURDENSEASALT and PS change over level.
  • Figure 4: Example estimated graphical models for $\lambda=20$. Low level graphs are noisy, with the Level 1 graph containing 38.2% of all possible connections. Higher level graphs show reasonable variable clusters until eventually no graph edges exist.
  • Figure 5: The first level at which a variable becomes (and remains) independent. Variables are ordered in increasing fashion according to the implied independence level for $\lambda=1$. Results for $\lambda=20$ accompany the taller bars and show a similar story for variable groups but with earlier levels of independence.
  • ...and 4 more figures