On Conditional Independence Graph Learning From Multi-Attribute Gaussian Dependent Time Series
Jitendra K. Tugnait
TL;DR
This work develops a unified framework for learning the conditional independence graph of high-dimensional, multi-attribute Gaussian time series by formulating a penalized log-likelihood in the frequency domain. It analyzes both convex (sparse-group lasso) and non-convex (log-sum, SCAD) penalties and establishes theoretical guarantees of consistency and graph recovery under high-dimensional scaling without requiring incoherence conditions. The optimization employs an ADMM-based approach with local linear approximation to handle non-convex penalties, and tuning parameters are selected via Bayesian information criterion. Empirical results on synthetic data and a Beijing air-quality dataset demonstrate that non-convex penalties, particularly log-sum, can yield sparser, more accurate CIGs and favorable trade-offs between recovery performance and computation.
Abstract
Estimation of the conditional independence graph (CIG) of high-dimensional multivariate Gaussian time series from multi-attribute data is considered. Existing methods for graph estimation for such data are based on single-attribute models where one associates a scalar time series with each node. In multi-attribute graphical models, each node represents a random vector or vector time series. In this paper we provide a unified theoretical analysis of multi-attribute graph learning for dependent time series using a penalized log-likelihood objective function formulated in the frequency domain using the discrete Fourier transform of the time-domain data. We consider both convex (sparse-group lasso) and non-convex (log-sum and SCAD group penalties) penalty/regularization functions. We establish sufficient conditions in a high-dimensional setting for consistency (convergence of the inverse power spectral density to true value in the Frobenius norm), local convexity when using non-convex penalties, and graph recovery. We do not impose any incoherence or irrepresentability condition for our convergence results. We also empirically investigate selection of the tuning parameters based on the Bayesian information criterion, and illustrate our approach using numerical examples utilizing both synthetic and real data.
