Table of Contents
Fetching ...

A classification of overlapping clustering schemes for hypergraphs

Vilhelm Agdur

TL;DR

This work develops a rigorous, category-theoretic framework for overlapping clustering in hypergraphs, extending the representability program of Carlsson–Mémoli to overlapping partitions. It shows that a clustering scheme is representable if and only if it is excisive and functorial, up to a refinement notion when $k>1$, and that every excisive and functorial scheme is refined by a representable one; it also provides computational guarantees on graphs of bounded expansion for fixed representations. The paper introduces endofunctors $\Phi_{\mathfrak{R}}$ determined by a representing set $\mathfrak{R}$ and composes them with the $k$-line component functor $\Pi_k$ to realize clustering schemes as $\Pi_k \circ \Phi_{\mathfrak{R}}$, with deep results about (non)finite representability. Additionally, it establishes practical complexity bounds, showing linear-time computation relative to graph size and a polynomial bound depending on $|V|^{\alpha(\mathfrak{R})}$, where $\alpha(\mathfrak{R})$ is the maximum independence number in $\mathfrak{R}$, for graphs with bounded expansion. Overall, the work provides a principled, scalable framework to compare and construct clustering schemes with theoretical guarantees in hypergraph settings.

Abstract

Community detection in graphs is a problem that is likely to be relevant whenever network data appears, and consequently the problem has received much attention with many different methods and algorithms applied. However, many of these methods are hard to study theoretically, and they optimise for somewhat different goals. A general and rigorous account of the problem and possible methods remains elusive. We study the problem of finding overlapping clusterings of hypergraphs, continuing the line of research started by Carlsson and Mémoli (2013) of classifying clustering schemes as functors. We extend their notion of representability to the overlapping case, showing that any representable overlapping clustering scheme is excisive and functorial, and any excisive and functorial clustering scheme is isomorphic to a representable clustering scheme. We also note that, for simple graphs, any representable clustering scheme is computable in polynomial time on graphs of bounded expansion, with an exponent determined by the maximum independence number of a graph in the representing set. This result also applies to non-overlapping representable clustering schemes, and so may be of independent interest.

A classification of overlapping clustering schemes for hypergraphs

TL;DR

This work develops a rigorous, category-theoretic framework for overlapping clustering in hypergraphs, extending the representability program of Carlsson–Mémoli to overlapping partitions. It shows that a clustering scheme is representable if and only if it is excisive and functorial, up to a refinement notion when , and that every excisive and functorial scheme is refined by a representable one; it also provides computational guarantees on graphs of bounded expansion for fixed representations. The paper introduces endofunctors determined by a representing set and composes them with the -line component functor to realize clustering schemes as , with deep results about (non)finite representability. Additionally, it establishes practical complexity bounds, showing linear-time computation relative to graph size and a polynomial bound depending on , where is the maximum independence number in , for graphs with bounded expansion. Overall, the work provides a principled, scalable framework to compare and construct clustering schemes with theoretical guarantees in hypergraph settings.

Abstract

Community detection in graphs is a problem that is likely to be relevant whenever network data appears, and consequently the problem has received much attention with many different methods and algorithms applied. However, many of these methods are hard to study theoretically, and they optimise for somewhat different goals. A general and rigorous account of the problem and possible methods remains elusive. We study the problem of finding overlapping clusterings of hypergraphs, continuing the line of research started by Carlsson and Mémoli (2013) of classifying clustering schemes as functors. We extend their notion of representability to the overlapping case, showing that any representable overlapping clustering scheme is excisive and functorial, and any excisive and functorial clustering scheme is isomorphic to a representable clustering scheme. We also note that, for simple graphs, any representable clustering scheme is computable in polynomial time on graphs of bounded expansion, with an exponent determined by the maximum independence number of a graph in the representing set. This result also applies to non-overlapping representable clustering schemes, and so may be of independent interest.
Paper Structure (11 sections, 19 theorems, 12 equations, 11 figures)

This paper contains 11 sections, 19 theorems, 12 equations, 11 figures.

Key Result

Theorem 1.1

Any representable clustering scheme is excisive and functorial. Any excisive and functorial clustering scheme is isomorphic to a representable clustering scheme. Further, for any class $\mathcal{C}$ of simple graphs of bounded expansion and any finitely representable clustering scheme $\Pi_{\mathfra

Figures (11)

  • Figure 1: An illustration of the resolution limit issue with modularity.
  • Figure 2: An illustration of two graphs in ${\mathcal{G}}$, with $H$ a subgraph of $G$, where the image of this morphism under $\Lambda_2$ gives a surprising morphism in $\mathcal{P}$, as explained in Example \ref{['ex:scandalous_morphism']}. Note that $H$ has all the edges of $G$, and an additional two edges, illustrated in red and orange -- the other edges have had their colour faded to help clarity, but are still present.
  • Figure 3: Consider the situation where $\mathfrak{R}$ consists of just $E_3$, the graph on three vertices with one edge containing all three vertices, and $G$ and $H$ are as in the figures. As we have seen previously, we will have $\Phi_\mathfrak{R}(G)$ isomorphic to $G$ and likewise for $H$. It is clear that $\Phi_\mathfrak{R}(G)$ is $2$-ly connected. $\Lambda_2(\Phi_\mathfrak{R}(H))$ has two connected components, one containing the blue edges and one containing the green edges, and so $\Pi_{\mathfrak{R},2}(H)$ will have two parts, $\{v_1, v_2, v_3, v_4\}$ and $\{v_3, v_4, v_5, v_6\}$. However, when we add $G$ into $\mathfrak{R}$, this will add some new edges into $\Phi_{\mathfrak{R}\cup \{G\}}(H)$, with vertex sets $v_1, v_2, v_3, v_4$ and $v_3, v_4, v_5, v_6$. (There will be $4$ edges with each vertex set, due to the symmetries of $G$.) These two edge sets overlap in $v_3$ and $v_4$, and so these edges form a connected component of $\Lambda_2(\Phi_{\mathfrak{R}\cup \{G\}}(H))$, which will be sent to a part containing all vertices. Thus $\Pi_{\mathfrak{R}\cup\{G\},2}(H)$ is not equal to $\Pi_{\mathfrak{R},2}(H)$.
  • Figure 4: The family of graphs $\{R_i\}_{i=0}^3$.
  • Figure 5: An example of a graph $G$ with a $2$-connected component $p$ (here, the entire vertex set) where $p"$ is a strict subset of $p'$, in the notation of the proof of Proposition \ref{['lem:pi_k_is_excisive']}. In particular, we note that $p"$ consists of the three blue edges, which form a clique in the $2$-line graph since they all overlap in the vertices $v_1$ and $v_2$, but the $2$-line graph also has an isolated vertex corresponding to the red edge. This red edge is in $p'$, but not in $p"$, and will be sent to a strict subpart of $p$ by $\Upsilon$.
  • ...and 6 more figures

Theorems & Definitions (69)

  • Theorem 1.1: Main results
  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4
  • Remark 3.5
  • Definition 3.6
  • Definition 3.7
  • Definition 3.8
  • Definition 4.1
  • ...and 59 more