Table of Contents
Fetching ...

Random Abstract Cell Complexes

Josef Hoppe, Michael T. Schaub

TL;DR

Random Abstract Cell Complexes (RCC) introduce a probabilistic framework to generate 2D abstract cell complexes by iteratively lifting a base Erdős–Rényi graph with boundary-size dependent inclusion probabilities for higher-dimensional cells. The work develops a spanning-tree based approach to tame the cycle space, and two practical approximate algorithms: one for counting cycles of given length and one for sampling 2-cells with a target probability, enabling scalable null-models and graph liftings for higher-order networks. The authors provide theoretical complexity results, analyze limitations, and validate the methods empirically across synthetic and real-world graphs, highlighting applications as null models, experiment baselines, and tools for sensitivity analyses in higher-order learning. Together, the results establish RCC as a versatile foundation for studying orientability, homology, and spectral properties of random CCs and for generating synthetic higher-order data with controllable cycle structure.

Abstract

We define a model for random (abstract) cell complexes (CCs), similiar to the well-known Erdős-Rényi model for graphs and its extensions for simplicial complexes. To build a random cell complex, we first draw from an Erdős-Rényi graph, and consecutively augment the graph with cells for each dimension with a specified probability. As the number of possible cells increases combinatorially -- e.g., 2-cells can be represented as cycles, or permutations -- we derive an approximate sampling algorithm for this model limited to two-dimensional abstract cell complexes. As a basis for this algorithm, we first introduce a spanning-tree-based method that samples simple cycles and allows the efficient approximation of various properties, most notably the probability of occurence of a given cycle. This approximation is of independent interest as it enables the approximation of a wide variety of cycle-related graph statistics using importance sampling. We use this to approximate the number of cycles of a given length on a graph, allowing us to calculate the sampling probability to arrive at a desired expected number of sampled 2-cells. The probability approximation also trivially leads to a sampling algorithm for $2$-cells with a desired sampling probability. We provide some initial analysis into the properties of random CCs drawn from this model. We further showcase practical applications for our random CCs as null models, and in the context of (random) liftings of graphs to cell complexes. The cycle sampling, cycle count estimation, and combined cell sampling algorithms are available in the package `py-raccoon` on the Python Packaging Index.

Random Abstract Cell Complexes

TL;DR

Random Abstract Cell Complexes (RCC) introduce a probabilistic framework to generate 2D abstract cell complexes by iteratively lifting a base Erdős–Rényi graph with boundary-size dependent inclusion probabilities for higher-dimensional cells. The work develops a spanning-tree based approach to tame the cycle space, and two practical approximate algorithms: one for counting cycles of given length and one for sampling 2-cells with a target probability, enabling scalable null-models and graph liftings for higher-order networks. The authors provide theoretical complexity results, analyze limitations, and validate the methods empirically across synthetic and real-world graphs, highlighting applications as null models, experiment baselines, and tools for sensitivity analyses in higher-order learning. Together, the results establish RCC as a versatile foundation for studying orientability, homology, and spectral properties of random CCs and for generating synthetic higher-order data with controllable cycle structure.

Abstract

We define a model for random (abstract) cell complexes (CCs), similiar to the well-known Erdős-Rényi model for graphs and its extensions for simplicial complexes. To build a random cell complex, we first draw from an Erdős-Rényi graph, and consecutively augment the graph with cells for each dimension with a specified probability. As the number of possible cells increases combinatorially -- e.g., 2-cells can be represented as cycles, or permutations -- we derive an approximate sampling algorithm for this model limited to two-dimensional abstract cell complexes. As a basis for this algorithm, we first introduce a spanning-tree-based method that samples simple cycles and allows the efficient approximation of various properties, most notably the probability of occurence of a given cycle. This approximation is of independent interest as it enables the approximation of a wide variety of cycle-related graph statistics using importance sampling. We use this to approximate the number of cycles of a given length on a graph, allowing us to calculate the sampling probability to arrive at a desired expected number of sampled 2-cells. The probability approximation also trivially leads to a sampling algorithm for -cells with a desired sampling probability. We provide some initial analysis into the properties of random CCs drawn from this model. We further showcase practical applications for our random CCs as null models, and in the context of (random) liftings of graphs to cell complexes. The cycle sampling, cycle count estimation, and combined cell sampling algorithms are available in the package `py-raccoon` on the Python Packaging Index.
Paper Structure (48 sections, 1 theorem, 25 equations, 19 figures, 1 table, 1 algorithm)

This paper contains 48 sections, 1 theorem, 25 equations, 19 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

On a graph sampled from an ER-model, our occurence probability approximation takes $\mathcal{O}(s\cdot{}m^{1+\varepsilon})$.

Figures (19)

  • Figure 1: Overview of the random abstract cell complex model. The model is defined through iterative liftings from $d$ to $d+1$-dimensional cell complexes. This makes it possible to fix a skeleton of the complex and only apply the random lifting for subsequent dimensions. In this paper, we focus on the lifting from one- to two-dimensional abstract cell complexes. For this, we present a novel, efficient approximate sampling algorithm (see \ref{['fig:sampling']}).
  • Figure 2: Approximation of the transition probabilities $\tau^{(i)}_c$ of the Laplacian Random Walk. The node to the left is always the previous node in the walk (or, for the first step, the target node). In the general case $\tau^{(i)}$, we know that one neighbor is visited and one neighbor is unvisited. All other neighbors are assumed to be equally likely, i.e., the $d-2$ remaining neighbors' probability is averaged over the visited, unvisited, and target nodes. In the first step, we know that the target node is a neighbor of the current node, reducing the probability of taking the exact path. In the penultimate step $\tau^{(l-2)}$, the next node on the path has a higher probability of being chosen because it is certainly adjacent to the target node. In the final step $\tau^{(l-1)}$, the next node is the target node and thus has a higher probability of being chosen.
  • Figure 3: Estimating a distribution of sampling probabilities from sampled objects and their probabilities. The top row shows a graph, its cycles, and the distribution of the occurence probabilities of the cycles. The goal is to approximate the distribution (dotted arrow) efficiently. To do this, we sample multiple uniform spanning trees. Such a spanning tree (bottom row) also effectively samples a set of (induced) cycles. The distribution of these cycles is, in expectation, the correct distribution multiplied with the occurence probability. By multiplying the sampled distribution with the inverse of the occurence probability, we get an approximation of the true distribution. The final approximation is the average over all approximations we obtained from different spanning trees.
  • Figure 4: Random lifting from 1-dim. CC to 2-dim. CC and our sampling algorithm. The model (strong background colors) is simulated by dividing it into two steps (light background colors): First, $s$ uniform spanning trees are sampled on the graph, each inducing a subset of all cycles. Depending on the probability $\rho_c$ for any cycle to appear in such a subset, the cycles are then sampled as cells. The algorithm (boxes) closely follows the two-step sampling model, approximating $\rho_c$ for efficiency.
  • Figure 5: Illustration of the efficient calculation based on the tree structure.
  • ...and 14 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Remark 1: Complexity of calculating cycle lengths
  • Remark 2: Complexity of sampling $2$-cells