A Copula Graphical Model for Multi-Attribute Data using Optimal Transport
Qi Zhang, Bing Li, Lingzhou Xue
TL;DR
The paper addresses learning conditional independence among multi-attribute node vectors by introducing a semiparametric Cyclically Monotone Copula Gaussian Graphical Model (CMC-GGM) that maps node marginals to Gaussian scores via optimal transport. A high-dimensional extension, the Projected CMC-GGM (PCMCG), reduces the curse of dimensionality by restricting non-Gaussianity to a low-dimensional subspace and performing Gaussianization there. The authors establish identifiability, concentration, and selection-consistency results, and present practical estimation schemes including OT-based transformation estimation and sparse precision estimation via thresholding, group graphical lasso, and neighborhood vector-on-vector selection. Extensive simulations demonstrate robustness to non-Gaussian marginals, and real-data applications (gene/protein networks and color textures) show improved edge recovery and alignment with underlying structure. Overall, the framework provides a flexible, scalable approach for learning multi-attribute graphs from non-Gaussian data using OT-driven marginal transformations.
Abstract
Motivated by modern data forms such as images and multi-view data, the multi-attribute graphical model aims to explore the conditional independence structure among vectors. Under the Gaussian assumption, the conditional independence between vectors is characterized by blockwise zeros in the precision matrix. To relax the restrictive Gaussian assumption, in this paper, we introduce a novel semiparametric multi-attribute graphical model based on a new copula named Cyclically Monotone Copula. This new copula treats the distribution of the node vectors as multivariate marginals and transforms them into Gaussian distributions based on the optimal transport theory. Since the model allows the node vectors to have arbitrary continuous distributions, it is more flexible than the classical Gaussian copula method that performs coordinatewise Gaussianization. We establish the concentration inequalities of the estimated covariance matrices and provide sufficient conditions for selection consistency of the group graphical lasso estimator. For the setting with high-dimensional attributes, a {Projected Cyclically Monotone Copula} model is proposed to address the curse of dimensionality issue that arises from solving high-dimensional optimal transport problems. Numerical results based on synthetic and real data show the efficiency and flexibility of our methods.
