Table of Contents
Fetching ...

A Clique Partitioning-Based Algorithm for Graph Compression

Akshar Chavan, Sanaz Rabinia, Daniel Grosu, Marco Brocanelli

TL;DR

The paper tackles speeding up path-dependent graph algorithms on large graphs by introducing CPGC, a lossless graph compression method for bipartite graphs that preserves reachability while substantially reducing edges. CPGC improves upon Feder-Motwani's clique-partitioning approach by using degree-based vertex selection and enabling multiple delta-cliques per iteration, achieving an overall running time of $O(mn^{\delta})$ and a compression bound $|E^*| = O(m/k)$. Empirical results show CPGC delivers up to 26% greater compression and up to 105.18x faster preprocessing on large dense graphs, with subsequent speedups in matching algorithms reaching 72.83% when using the compressed graph. The approach thus provides a scalable, path-preserving graph compression framework that accelerates downstream graph tasks such as all-pairs shortest paths and matching, while maintaining exact path information. It also includes extensions to non-bipartite graphs and practical guidance through appendices detailing FM, examples, proofs, and non-bipartite transformations.

Abstract

Reducing the running time of graph algorithms is vital for tackling real-world problems such as shortest paths and matching in large-scale graphs, where path information plays a crucial role. This paper addresses this critical challenge of reducing the running time of graph algorithms by proposing a new graph compression algorithm that partitions the graph into bipartite cliques and uses the partition to obtain a compressed graph having a smaller number of edges while preserving the path information. This compressed graph can then be used as input to other graph algorithms for which path information is essential, leading to a significant reduction of their running time, especially for large, dense graphs. The running time of the proposed algorithm is $O(mn^δ)$, where $0 \leq δ\leq 1$, which is better than $O(mn^δ\log^2 n)$, the running time of the best existing clique partitioning-based graph compression algorithm (the Feder-Motwani (\textsf{FM}) algorithm). Our extensive experimental analysis show that our algorithm achieves a compression ratio of up to $26\%$ greater and executes up to 105.18 times faster than the \textsf{FM} algorithm. In addition, on large graphs with up to 1.05 billion edges, it achieves a compression ratio of up to 3.9, reducing the number of edges up to $74.36\%$. Finally, our tests with a matching algorithm on sufficiently large, dense graphs, demonstrate a reduction in the running time of up to 72.83\% when the input is the compressed graph obtained by our algorithm, compared to the case where the input is the original uncompressed graph.

A Clique Partitioning-Based Algorithm for Graph Compression

TL;DR

The paper tackles speeding up path-dependent graph algorithms on large graphs by introducing CPGC, a lossless graph compression method for bipartite graphs that preserves reachability while substantially reducing edges. CPGC improves upon Feder-Motwani's clique-partitioning approach by using degree-based vertex selection and enabling multiple delta-cliques per iteration, achieving an overall running time of and a compression bound . Empirical results show CPGC delivers up to 26% greater compression and up to 105.18x faster preprocessing on large dense graphs, with subsequent speedups in matching algorithms reaching 72.83% when using the compressed graph. The approach thus provides a scalable, path-preserving graph compression framework that accelerates downstream graph tasks such as all-pairs shortest paths and matching, while maintaining exact path information. It also includes extensions to non-bipartite graphs and practical guidance through appendices detailing FM, examples, proofs, and non-bipartite transformations.

Abstract

Reducing the running time of graph algorithms is vital for tackling real-world problems such as shortest paths and matching in large-scale graphs, where path information plays a crucial role. This paper addresses this critical challenge of reducing the running time of graph algorithms by proposing a new graph compression algorithm that partitions the graph into bipartite cliques and uses the partition to obtain a compressed graph having a smaller number of edges while preserving the path information. This compressed graph can then be used as input to other graph algorithms for which path information is essential, leading to a significant reduction of their running time, especially for large, dense graphs. The running time of the proposed algorithm is , where , which is better than , the running time of the best existing clique partitioning-based graph compression algorithm (the Feder-Motwani (\textsf{FM}) algorithm). Our extensive experimental analysis show that our algorithm achieves a compression ratio of up to greater and executes up to 105.18 times faster than the \textsf{FM} algorithm. In addition, on large graphs with up to 1.05 billion edges, it achieves a compression ratio of up to 3.9, reducing the number of edges up to . Finally, our tests with a matching algorithm on sufficiently large, dense graphs, demonstrate a reduction in the running time of up to 72.83\% when the input is the compressed graph obtained by our algorithm, compared to the case where the input is the original uncompressed graph.

Paper Structure

This paper contains 19 sections, 4 theorems, 1 equation, 6 figures, 1 table, 3 algorithms.

Key Result

Theorem 4.1

The compressed graph $G^*(U,W,Z,E^*)$ obtained by CPGC preserves the path information of the original graph $G(U,W,E)$.

Figures (6)

  • Figure 1: (a) Given bipartite graph $G(U,W,E)$; (b) Neighborhood tree of vertex $u_2 \in U$ that shows the path $\omega$ taken from root to the vertex $w_j \in W$ at the leaf and number of edges $d_{u_2,\omega}$ for each node using the tuple $(\omega, d_{u_2,\omega})$; and (c) the tripartite graph that replaces the $\delta$-clique with left partition $\{u_1,u_2,u_3, u_4, u_5, u_7, u_8\}$ and right partition $\{w_4, w_5\}$ in the compressed graph $G^{*}(U,W,Z, E^{*})$.
  • Figure 2: Progression of $\hat{m}$, $\hat{k}$, and number of cliques extracted for a graph with 128 vertices in each bi-partition, density 0.98, and $\delta = 1$ by FM and CPGC.
  • Figure 5: CPGC: Average compression ratio (top row: (a), (b), and (c)) and running time (bottom row: (d), (e), and (f)) for graphs with 3.36 million to 1.05 billion edges.
  • Figure 6: CPGC vs. FM: Compression ratio relative to FM (top row: (a), (b) and (c)) and speedup relative to FM (bottom row: (d), (e) and (f)) for graphs with 819 to 16 thousand edges.
  • Figure 7: Dinitz($G$) vs. Dinitz($G^*$): Average running time of Dinitz's algorithm on the original bipartite graph $G$ (labeled Dinitz($G$)) and on the compressed graph $G^*$ (labeled Dinitz($G^*$)), for a large graph with approximately 32,000 vertices in each bipartition and 214.75 million to 1.05 billion edges corresponding to densities = $0.80, 0.90,$ and $0.98$ and different $\delta$.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Theorem 4.1
  • Theorem 4.2
  • Lemma 4.1
  • Theorem 4.3