Table of Contents
Fetching ...

a-TMFG: Scalable Triangulated Maximally Filtered Graphs via Approximate Nearest Neighbors

Lionel Yelibi

TL;DR

The Approximate Triangular Maximally Filtered Graph (a-TMFG) algorithm is presented, a novel approach to scaling the construction of artificial graphs from data inspired by TMFG, and provides a parsimonious way to construct graphs for use-cases where graphs are used as input to supervised and unsupervised learning but where no natural graph exists.

Abstract

The traditional Triangular Maximally Filtered Graph (TMFG) construction requires pre-computation and storage of a dense correlation matrix; this limits its applicability to small and medium-sized datasets. Here we identify key memory and runtime complexity challenges when using TMFG at scale. We then present the Approximate Triangular Maximally Filtered Graph (a-TMFG) algorithm. This is a novel approach to scaling the construction of artificial graphs from data inspired by TMFG. The method employs k-Nearest Neighbors Graphs (kNNG) for initial construction, and implements a memory management strategy to search and estimate missing correlations on-the-fly. This provides representations to control combinatorial explosion. The algorithm is tested for robustness to the parameters and noise, and is evaluated on datasets with millions of observations. This new method provides a parsimonious way to construct graphs for use-cases where graphs are used as input to supervised and unsupervised learning but where no natural graph exists.

a-TMFG: Scalable Triangulated Maximally Filtered Graphs via Approximate Nearest Neighbors

TL;DR

The Approximate Triangular Maximally Filtered Graph (a-TMFG) algorithm is presented, a novel approach to scaling the construction of artificial graphs from data inspired by TMFG, and provides a parsimonious way to construct graphs for use-cases where graphs are used as input to supervised and unsupervised learning but where no natural graph exists.

Abstract

The traditional Triangular Maximally Filtered Graph (TMFG) construction requires pre-computation and storage of a dense correlation matrix; this limits its applicability to small and medium-sized datasets. Here we identify key memory and runtime complexity challenges when using TMFG at scale. We then present the Approximate Triangular Maximally Filtered Graph (a-TMFG) algorithm. This is a novel approach to scaling the construction of artificial graphs from data inspired by TMFG. The method employs k-Nearest Neighbors Graphs (kNNG) for initial construction, and implements a memory management strategy to search and estimate missing correlations on-the-fly. This provides representations to control combinatorial explosion. The algorithm is tested for robustness to the parameters and noise, and is evaluated on datasets with millions of observations. This new method provides a parsimonious way to construct graphs for use-cases where graphs are used as input to supervised and unsupervised learning but where no natural graph exists.
Paper Structure (13 sections, 6 equations, 8 figures, 1 algorithm)

This paper contains 13 sections, 6 equations, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: The face counts against the location $L_j$ of a given face $f_j$ in the face universe $\mathcal{F}$ used in the TFMG algorithm. This shows that most of the selected faces are found at the end of the sequence of faces (on the right near location $L_j=0$) rather than near the beginning (near location $L_j=-1$). This implies that scaling can be enhanced by exploiting the exploration frontier on the right of the plot, and forgetting unnecessary faces (See Section \ref{['sec:scalability']}). This is what the "approximate" refinement of the TMFG method (a-TFMG) exploits.
  • Figure 2: Network topologies derived from a 1-factor Gaussian mixture model. (\ref{['fig:factor_graph']}) block-diagonal correlation matrix. (\ref{['fig:factor_tmfg']}) Exact TMFG constructed from the same dataset. The TMFG induces a sparse, hierarchical structure, highlighting the need for generative models capable of capturing short-range interactions (see Section \ref{['sec:evaluation']}).
  • Figure 3: Topological consistency of 1,000 exact TMFGs estimated from identically parameterized 1-factor models ($N=1000, K=5$). (\ref{['fig:dgp_short']}) Average shortest path between intra-cluster nodes. (\ref{['fig:dgp_jac']}) Distribution of pairwise Jaccard scores between the resulting edge lists. The near-zero Jaccard overlap illustrates the instability of TMFGs on purely block-diagonal matrices, motivating the use of GMRFs in Section \ref{['sec:evaluation']}.
  • Figure 4: UMAP projection of an a-TMFG constructed for $N=100,000$ nodes generated from a Gaussian Markov Random Field (Section \ref{['sec:gmrf']}). The scalable algorithm successfully recovers the ground-truth clusters while preserving the dendritic, maximal planar topology characteristic of exact TMFGs.
  • Figure 5: Average Jaccard similarity between a-TMFG and the exact connectivity matrix across varying dataset sizes $N$ and GMRF spatial dependence parameter $\alpha$. As analyzed in Section \ref{['sec:eval_alpha']}, the algorithm optimally recovers structures when $0.2 \leq \alpha \leq 0.3$.
  • ...and 3 more figures