Table of Contents
Fetching ...

From Global to Local Correlation: Geometric Decomposition of Statistical Inference

Pawel Gajer, Jacques Ravel

TL;DR

A geometric decomposition framework offering two strategies for partitioning inference problems into regional analyses on data-derived Riemannian graphs, with Bayesian posterior sampling providing credible intervals.

Abstract

Understanding feature-outcome associations in high-dimensional data remains challenging when relationships vary across subpopulations, yet standard methods assuming global associations miss context-dependent patterns, reducing statistical power and interpretability. We develop a geometric decomposition framework offering two strategies for partitioning inference problems into regional analyses on data-derived Riemannian graphs. Gradient flow decomposition uses path-monotonicity-validated discrete Morse theory to partition samples into gradient flow cells where outcomes exhibit monotonic behavior. Co-monotonicity decomposition utilizes vertex-level coefficients that provide context-dependent versions of the classical Pearson correlation: these coefficients measure edge-based directional concordance between outcome and features, or between feature pairs, defining embeddings of samples into association space. These embeddings induce Riemannian k-NN graphs on which biclustering identifies co-monotonicity cells (coherent regions) and feature modules. This extends naturally to multi-modal integration across multiple feature sets. Both strategies apply independently or jointly, with Bayesian posterior sampling providing credible intervals.

From Global to Local Correlation: Geometric Decomposition of Statistical Inference

TL;DR

A geometric decomposition framework offering two strategies for partitioning inference problems into regional analyses on data-derived Riemannian graphs, with Bayesian posterior sampling providing credible intervals.

Abstract

Understanding feature-outcome associations in high-dimensional data remains challenging when relationships vary across subpopulations, yet standard methods assuming global associations miss context-dependent patterns, reducing statistical power and interpretability. We develop a geometric decomposition framework offering two strategies for partitioning inference problems into regional analyses on data-derived Riemannian graphs. Gradient flow decomposition uses path-monotonicity-validated discrete Morse theory to partition samples into gradient flow cells where outcomes exhibit monotonic behavior. Co-monotonicity decomposition utilizes vertex-level coefficients that provide context-dependent versions of the classical Pearson correlation: these coefficients measure edge-based directional concordance between outcome and features, or between feature pairs, defining embeddings of samples into association space. These embeddings induce Riemannian k-NN graphs on which biclustering identifies co-monotonicity cells (coherent regions) and feature modules. This extends naturally to multi-modal integration across multiple feature sets. Both strategies apply independently or jointly, with Bayesian posterior sampling providing credible intervals.

Paper Structure

This paper contains 49 sections, 75 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: Domain decomposition via gradient flow. Panel A: One-dimensional function with local minima ($m_1$, $m_2$, $m_3$) and maxima ($M_1$, $M_2$), showing gradient directions and partitioning into ascending/descending basins and gradient flow cells. Panel B: Gradient flow graph with vertices as critical points (labeled with function values) and edges connecting minimum-maximum pairs, enabling monotonic statistical modeling within each cell.
  • Figure 2: Gradient flow decomposition of a two-Gaussian mixture on $[0,1]^2$. Top-left: Continuous function with critical points, gradient trajectories, and cell boundaries. Top-right: Random sample with color-coded function values. Bottom-right: k-nearest neighbor graph (k = 36) with selected gradient trajectories. Bottom-left: Gradient flow complex where solid edges connect minimum-maximum pairs and dashed edges represent minimum-minimum cells with saddle points; edges shown for cells with at least 25 points.
  • Figure 3: Co-monotonicity association profiles in vaginal microbiome data. Heatmap shows vertex-level smoothed (see Section 4.5) co-monotonicity coefficients between spontaneous preterm birth outcome and bacterial phylotype abundances across samples from pregnant women (rows: samples, columns: phylotypes). Hierarchical clustering on both axes reveals coherent blocks: samples (rows) group by shared association patterns, while phylotypes (columns) cluster by co-varying relationships with the outcome. Red indicates positive co-monotonicity (phylotype and outcome increase together), blue indicates negative co-monotonicity (inverse relationship), and yellow indicates independence. The block structure demonstrates how biclustering on these association profiles identifies co-monotonicity cells—regions where specific feature modules exhibit consistent outcome associations.
  • Figure 4: Scale artifacts in co-monotonicity coefficients are eliminated by geometric smoothing. (A) Raw correlation-type coefficients computed with derivative versus unit weighting for sPTB prevalence and 106 phylotypes across 224,402 vertex-phylotype pairs. Red points indicate pairs with large discrepancies ($|\text{difference}| > 0.75$). (B) The same coefficients after geometric smoothing via graph Laplacian filtering. Smoothing eliminates scale-dependent discrepancies, demonstrating that both weighting schemes recover nearly identical association structure at consistent geometric scale.
  • Figure 5: Raw unit-weighted coefficients do not match smoothed derivative-weighted coefficients, demonstrating that smoothing fundamentally transforms both weighting schemes. Comparison of raw unit-weighted co-monotonicity coefficients versus smoothed derivative-weighted coefficients for sPTB prevalence and 106 phylotypes across 224,402 vertex-phylotype pairs. The wide scatter (Gini mean difference: 0.322) contrasts with the tight agreement between smoothed unit-weighted and smoothed derivative-weighted coefficients (Figure \ref{['fig:cm_cor_comparison']}B), confirming that geometric smoothing modifies association structure rather than merely adjusting one scheme to match the other.

Theorems & Definitions (1)

  • Definition 1: Dirichlet Resampling for Posterior Uncertainty