Table of Contents
Fetching ...

Methods for Recovering Conditional Independence Graphs: A Survey

Harsh Shrivastava, Urszula Chajewska

TL;DR

This paper addresses recovering conditional-independence graphs from data, focusing on undirected graphs whose edges encode partial correlations between features. It surveys two core formulations: directly estimating partial correlations via regression and via sparse precision matrices with Graphical Lasso and its variants, including deep unfolding models GLAD/uGLAD and tensorized approaches such as TeraLasso and SyGlasso. To support heterogeneous data, it reviews covariance constructions for mixed datatypes and their impact on edge inference, and it discusses applications across life sciences, medical informatics, finance, and time-series, highlighting the potential for integration with deep learning and tensor methods. The work provides a consolidated taxonomy, practical implementation guidance, and a roadmap to promote wider adoption of CI-graph recovery as a mainstream data-exploration tool.

Abstract

Conditional Independence (CI) graphs are a type of probabilistic graphical models that are primarily used to gain insights about feature relationships. Each edge represents the partial correlation between the connected features which gives information about their direct dependence. In this survey, we list out different methods and study the advances in techniques developed to recover CI graphs. We cover traditional optimization methods as well as recently developed deep learning architectures along with their recommended implementations. To facilitate wider adoption, we include preliminaries that consolidate associated operations, for example techniques to obtain covariance matrix for mixed datatypes.

Methods for Recovering Conditional Independence Graphs: A Survey

TL;DR

This paper addresses recovering conditional-independence graphs from data, focusing on undirected graphs whose edges encode partial correlations between features. It surveys two core formulations: directly estimating partial correlations via regression and via sparse precision matrices with Graphical Lasso and its variants, including deep unfolding models GLAD/uGLAD and tensorized approaches such as TeraLasso and SyGlasso. To support heterogeneous data, it reviews covariance constructions for mixed datatypes and their impact on edge inference, and it discusses applications across life sciences, medical informatics, finance, and time-series, highlighting the potential for integration with deep learning and tensor methods. The work provides a consolidated taxonomy, practical implementation guidance, and a roadmap to promote wider adoption of CI-graph recovery as a mainstream data-exploration tool.

Abstract

Conditional Independence (CI) graphs are a type of probabilistic graphical models that are primarily used to gain insights about feature relationships. Each edge represents the partial correlation between the connected features which gives information about their direct dependence. In this survey, we list out different methods and study the advances in techniques developed to recover CI graphs. We cover traditional optimization methods as well as recently developed deep learning architectures along with their recommended implementations. To facilitate wider adoption, we include preliminaries that consolidate associated operations, for example techniques to obtain covariance matrix for mixed datatypes.
Paper Structure (9 sections, 5 equations, 8 figures, 1 table)

This paper contains 9 sections, 5 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Graph Recovery approaches. Methods used to recover Conditional Independence graphs are the focus of this survey. The recovered CI graph shows partial correlations between the feature nodes. The algorithms (leaf nodes) listed here are representative of the sub-category and the list is not exhaustive.
  • Figure 2: The recurrent unit GLADcell. (Taken from shrivastava2020glad)
  • Figure 3: [left] uGLAD graph for archaea at family level in a collection of wastewater processing digesters. Edge color indicates the sign of the correlation: green - positive, red - negative, edge weight corresponds to correlation's strength (taken from shrivastava2022uglad). [right] CI graphs from uGLAD model used to analyse a lung cancer data from lcData.
  • Figure 4: Recovered graph structures for a sub-network of the E. coli consisting of $43$ genes and $30$ interactions with increasing samples. GLAD was trained using ground truth from a synthetic gene expression data simulator. Increasing the samples reduces the FDR by discovering more true edges. We denote, TPR: True Positive Rate, FPR: False Positive Rate, FDR: False Discovery Rate. (taken from shrivastava2019glad).
  • Figure 5: The CI graph recovered by uGLAD for the Infant Mortality 2015 data from CDC CDC:InfantLinkedDatasets (taken from shrivastava2022neural).
  • ...and 3 more figures