Overlapping and Robust Edge-Colored Clustering in Hypergraphs
Alex Crane, Brian Lavallee, Blair D. Sullivan, Nate Veldt
TL;DR
The paper tackles edge-colored hypergraph clustering with two practical needs: allowing overlapping cluster memberships and robustness to noise. It generalizes Edge-Colored Clustering into Local ECC, Global ECC, and Robust ECC, and develops greedy and LP-based bicriteria approximations that minimize edge mistakes under budget constraints. It establishes parameterized complexity results, proving FPT algorithms for the combined parameter $t+b$ while showing W-hardness in $t$ or $b$ individually, and offers kernelization bounds. Empirical results on six real datasets demonstrate that LP-rounding methods often achieve near-optimal edge satisfaction with fast runtimes, validating the approach and its utility for real-world hypergraph data.
Abstract
A recent trend in data mining has explored (hyper)graph clustering algorithms for data with categorical relationship types. Such algorithms have applications in the analysis of social, co-authorship, and protein interaction networks, to name a few. Many such applications naturally have some overlap between clusters, a nuance which is missing from current combinatorial models. Additionally, existing models lack a mechanism for handling noise in datasets. We address these concerns by generalizing Edge-Colored Clustering, a recent framework for categorical clustering of hypergraphs. Our generalizations allow for a budgeted number of either (a) overlapping cluster assignments or (b) node deletions. For each new model we present a greedy algorithm which approximately minimizes an edge mistake objective, as well as bicriteria approximations where the second approximation factor is on the budget. Additionally, we address the parameterized complexity of each problem, providing FPT algorithms and hardness results.
