Table of Contents
Fetching ...

Faster Inference of Cell Complexes from Flows via Matrix Factorization

Til Spreuer, Josef Hoppe, Michael T. Schaub

TL;DR

The paper tackles inferring a 2D cell complex from edge-flow data on graphs by augmenting the graph with 2-cells so edge flows decompose into gradient and curl components. It introduces MFCI, a matrix-factorization–based heuristic that approximates the cell-inference problem more quickly than prior SPH methods, while maintaining competitive accuracy, especially in noisy settings. Through synthetic and real-world experiments, it demonstrates substantial speed-ups and favorable performance trade-offs, with potential for hybrid approaches. The work advances scalable topological signal processing by enabling efficient construction of interpretable, higher-order representations of edge flows.

Abstract

We consider the following inference problem: Given a set of edge-flow signals observed on a graph, lift the graph to a cell complex, such that the observed edge-flow signals can be represented as a sparse combination of gradient and curl flows on the cell complex. Specifically, we aim to augment the observed graph by a set of 2-cells (polygons encircled by closed, non-intersecting paths), such that the eigenvectors of the Hodge Laplacian of the associated cell complex provide a sparse, interpretable representation of the observed edge flows on the graph. As it has been shown that the general problem is NP-hard in prior work, we here develop a novel matrix-factorization-based heuristic to solve the problem. Using computational experiments, we demonstrate that our new approach is significantly less computationally expensive than prior heuristics, while achieving only marginally worse performance in most settings. In fact, we find that for specifically noisy settings, our new approach outperforms the previous state of the art in both solution quality and computational speed.

Faster Inference of Cell Complexes from Flows via Matrix Factorization

TL;DR

The paper tackles inferring a 2D cell complex from edge-flow data on graphs by augmenting the graph with 2-cells so edge flows decompose into gradient and curl components. It introduces MFCI, a matrix-factorization–based heuristic that approximates the cell-inference problem more quickly than prior SPH methods, while maintaining competitive accuracy, especially in noisy settings. Through synthetic and real-world experiments, it demonstrates substantial speed-ups and favorable performance trade-offs, with potential for hybrid approaches. The work advances scalable topological signal processing by enabling efficient construction of interpretable, higher-order representations of edge flows.

Abstract

We consider the following inference problem: Given a set of edge-flow signals observed on a graph, lift the graph to a cell complex, such that the observed edge-flow signals can be represented as a sparse combination of gradient and curl flows on the cell complex. Specifically, we aim to augment the observed graph by a set of 2-cells (polygons encircled by closed, non-intersecting paths), such that the eigenvectors of the Hodge Laplacian of the associated cell complex provide a sparse, interpretable representation of the observed edge flows on the graph. As it has been shown that the general problem is NP-hard in prior work, we here develop a novel matrix-factorization-based heuristic to solve the problem. Using computational experiments, we demonstrate that our new approach is significantly less computationally expensive than prior heuristics, while achieving only marginally worse performance in most settings. In fact, we find that for specifically noisy settings, our new approach outperforms the previous state of the art in both solution quality and computational speed.

Paper Structure

This paper contains 6 sections, 5 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: Overview of the complete inference approach. (a) shows an overview of our Cell Inference Algorithm, adapted and modified from hoppe2023representing. Like the original algorithm, our matrix-based alternative takes a graph with flows on its edges as an input (1) and iteratively adds $2$-cells. Both approaches project the flows into the harmonic space (2). Our approach introduces a matrix-factorization-based heuristic for finding candidates (3) that also makes the individual evaluation of the candidates (4) optional in practice. Furthermore, our approach can add multiple cells (instead of one) in each iteration (5). (b) shows the concept of the novel candidate heuristic in MFCI. The harmonic flows are decomposed into two matrices, resembling a relaxed version of flows generated by a boundary matrix. Thus, the left matrix can be discretized to one cell candidate per column. Pseudocode of steps 2-5 is given in \ref{['alg:matfact_cs']}
  • Figure 2: MFCI with deterministic candidate heuristic and SPH on an Erdős–Rényi with $n=40$, $p=0.9$, 50 sampled 2-cells, 64 flows and edge noise with $\sigma = 0.3$. "1oo8" means that the best cell of 8 candidates got added each iteration and "8oo-1" adds all 8 cells that get inferred each iteration. In (a) the SVD is the theoretical mathematical optimum. In (b) SVD, Construction and Random have no corresponding time, because they are only shown for reference.
  • Figure 3: MFCI (random walk heuristic) vs SPH on Taxi Set with 128 flows. The Max Spanning Trees only evaluates 1 candidate per iteration. The SVD shows the approximation loss of $\mathbf{F}$ using the mathematical optimal approximation without constraints on the matrices as required by problem \ref{['eq:problem-variant-matfact']}. In subfigure (a), we see the size of the projected harmonic flow depending on the added 2-cells. In subfigure (b), the cumulative times are presented.
  • Figure 4: Comparison of using explicit LSMR projection (exp) and the approximation via pseudoinverse (pinv) to calculate the remaining harmonic flow. Candidates obtained using the deterministic heuristic.
  • Figure 5: (a) Relative performance of MFCI (SVD, best candidate out of $5$, approx. calculation of harmonic flow, deterministic candidate heuristic) compared to SPH on a CC sampled from Erdős–Rényi with $n=40, p=0.9$, 80 sampled 2-cells, and 64 flows. The flow and the noise were drawn from normal distributions with mean $\mu=0$; the former with a standard deviation $\sigma = 1$ and the latter according to the legend. We show the relative performance because the inferred cells and approximation error change with the noise level. The relative performance is calculated as $(r - a)/(r-b)$ where $r$ is the average error of a random algorithm, $a$ is the error of MFCI and $b$ is the error of SPH. As such, a relative performance of $0$ is as good as adding random cells; a value of $1$ is as good as SPH. A value above $1$ indicates that MFCI outperforms SPH. (b) Different no. of ranks and candidates for MFCI (ICA, 1oo_); showing negligible impact of rank on accuracy. Experiment settings from \ref{['fig:lr-comp-on-er']};

Theorems & Definitions (3)

  • Definition 1: Simple cell complexes of dimension $2$
  • Remark 2: Orientation
  • Definition 3: Signal (or chain) space of a cell complex