Table of Contents
Fetching ...

Correlation Clustering with Overlap: a Heuristic Graph Editing Approach

Faisal N. Abu-Khzam, Lucas Isenmann, Sergio Thoumi

TL;DR

This paper addresses correlation clustering when data elements may belong to multiple groups and clusters should be diameter-bounded rather than strict cliques. It proposes 2CCEDVS, a model that combines $s$-Club Cluster Edge Deletion with Vertex Splitting to enable overlaps and reduce editing, and it provides two heuristics to scale to large graphs. Empirical results on synthetic LFR benchmarks and real biological networks show that 2CCEDVS delivers high-quality overlap-aware clusters, often outperforming established baselines in overlapping scenarios. The work highlights practical gains and outlines directions for speedups and further parameterization to control per-vertex edits.

Abstract

Correlation clustering seeks a partition of the vertex set of a given graph/network into groups of closely related, or just close enough, vertices so that elements of different groups are not close to each other. The problem has been previously modeled and studied as a graph editing problem, namely Cluster Editing, which assumes that closely related data elements must be adjacent. As such, the main objective (of the Cluster Editing problem) is to turn clusters into cliques as a way to identify them. This is to be obtained via two main edge editing operations: additions and deletions. There are two problems with the Cluster Editing model that we seek to address in this paper. First, ``closely'' related does not necessarily mean ``directly'' related. So closeness should be measured by relatively short distance. As such, we seek to turn clusters into (sub)graphs of small diameter. Second, in real applications, a data element can belong, or have roles, in multiple groups. In some cases, without allowing data elements to belong to more than one cluster each, makes it hard to achieve any clustering via classical partition-based methods. We address this latter problem by allowing vertex cloning, also known as vertex splitting. Heuristic methods for the introduced problem are presented along with experimental results showing the effectiveness of the proposed model and algorithmic approach.

Correlation Clustering with Overlap: a Heuristic Graph Editing Approach

TL;DR

This paper addresses correlation clustering when data elements may belong to multiple groups and clusters should be diameter-bounded rather than strict cliques. It proposes 2CCEDVS, a model that combines -Club Cluster Edge Deletion with Vertex Splitting to enable overlaps and reduce editing, and it provides two heuristics to scale to large graphs. Empirical results on synthetic LFR benchmarks and real biological networks show that 2CCEDVS delivers high-quality overlap-aware clusters, often outperforming established baselines in overlapping scenarios. The work highlights practical gains and outlines directions for speedups and further parameterization to control per-vertex edits.

Abstract

Correlation clustering seeks a partition of the vertex set of a given graph/network into groups of closely related, or just close enough, vertices so that elements of different groups are not close to each other. The problem has been previously modeled and studied as a graph editing problem, namely Cluster Editing, which assumes that closely related data elements must be adjacent. As such, the main objective (of the Cluster Editing problem) is to turn clusters into cliques as a way to identify them. This is to be obtained via two main edge editing operations: additions and deletions. There are two problems with the Cluster Editing model that we seek to address in this paper. First, ``closely'' related does not necessarily mean ``directly'' related. So closeness should be measured by relatively short distance. As such, we seek to turn clusters into (sub)graphs of small diameter. Second, in real applications, a data element can belong, or have roles, in multiple groups. In some cases, without allowing data elements to belong to more than one cluster each, makes it hard to achieve any clustering via classical partition-based methods. We address this latter problem by allowing vertex cloning, also known as vertex splitting. Heuristic methods for the introduced problem are presented along with experimental results showing the effectiveness of the proposed model and algorithmic approach.

Paper Structure

This paper contains 8 sections, 1 figure, 3 tables, 2 algorithms.

Figures (1)

  • Figure 1: Figure showing a graph clustered via 2CCED and 2CCEDVS (respectively)