Table of Contents
Fetching ...

New Algorithms and Hardness Results for Connected Clustering

Jan Eube, Heiko Röglin

TL;DR

This work provides exact algorithms that run in polynomial time if the treewidth $w$ is a constant and obtains constant approximation algorithms that run in FPT time with respect to the parameter $\max(w,k)$.

Abstract

Connected clustering denotes a family of constrained clustering problems in which we are given a distance metric and an undirected connectivity graph $G$ that can be completely unrelated to the metric. The aim is to partition the $n$ vertices into a given number $k$ of clusters such that every cluster forms a connected subgraph of $G$ and a given clustering objective gets minimized. The constraint that the clusters are connected has applications in many different fields, like for example community detection and geodesy. So far, $k$-center and $k$-median have been studied in this setting. It has been shown that connected $k$-median is $Ω(n^{1- ε})$-hard to approximate which also carries over to the connected $k$-means problem, while for connected $k$-center it remained an open question whether one can find a constant approximation in polynomial time. We answer this question by providing an $Ω(\log^*(k))$-hardness result for the problem. Given these hardness results, we study the problems on graphs with bounded treewidth. We provide exact algorithms that run in polynomial time if the treewidth $w$ is a constant. Furthermore, we obtain constant approximation algorithms that run in FPT time with respect to the parameter $\max(w,k)$. Additionally, we consider the min-sum-radii (MSR) and min-sum-diameter (MSD) objective. We prove that on general graphs connected MSR can be approximated with an approximation factor of $(3 + ε)$ and connected MSD with an approximation factor of $(4 + ε)$. The latter also directly improves the best known approximation guarantee for unconstrained MSD from $(6 + ε)$ to $(4 + ε)$.

New Algorithms and Hardness Results for Connected Clustering

TL;DR

This work provides exact algorithms that run in polynomial time if the treewidth is a constant and obtains constant approximation algorithms that run in FPT time with respect to the parameter .

Abstract

Connected clustering denotes a family of constrained clustering problems in which we are given a distance metric and an undirected connectivity graph that can be completely unrelated to the metric. The aim is to partition the vertices into a given number of clusters such that every cluster forms a connected subgraph of and a given clustering objective gets minimized. The constraint that the clusters are connected has applications in many different fields, like for example community detection and geodesy. So far, -center and -median have been studied in this setting. It has been shown that connected -median is -hard to approximate which also carries over to the connected -means problem, while for connected -center it remained an open question whether one can find a constant approximation in polynomial time. We answer this question by providing an -hardness result for the problem. Given these hardness results, we study the problems on graphs with bounded treewidth. We provide exact algorithms that run in polynomial time if the treewidth is a constant. Furthermore, we obtain constant approximation algorithms that run in FPT time with respect to the parameter . Additionally, we consider the min-sum-radii (MSR) and min-sum-diameter (MSD) objective. We prove that on general graphs connected MSR can be approximated with an approximation factor of and connected MSD with an approximation factor of . The latter also directly improves the best known approximation guarantee for unconstrained MSD from to .

Paper Structure

This paper contains 17 sections, 34 theorems, 36 equations, 7 figures, 1 table, 10 algorithms.

Key Result

Lemma 4

Given a tree decomposition $(T,\{B_t\}_{t \in V_T})$ of a graph $G = (V,E)$ with width $w$, one can compute a nice tree decomposition of $G$ of width $w$ and with $O(w|V|)$ nodes in time $O(\max(|V_T|,|V|)w^2)$.

Figures (7)

  • Figure 1: A solution for the graph $G_t$ fulfilling the specification $(a,\{U_1,U_2,U_3\})$ where $a(U_1) = a(U_3) = c_1$ and $a(U_2) = c_2$ with $c_2 \not\in V_t$. The solution requires $3$ clusters, finished ones are circled in black, the unfinished ones in the color corresponding to their assignment.
  • Figure 2: A sketch of a $2$-layer subinstance included in a $3$-layer instance. Note that the number of in- and outputs is not accurate and there are further gadgets, centers and subinstances in the 3-layer instance.
  • Figure 3: A depictions how a solution for $G_t$ (corresponding to a node $t$ joining two children $s_1$ and $s_2$) can be split up into two solutions for $G_{s_1}$ and $G_{s_2}$. The green ellipsoids correspond to clusters in $G_{s_1}$, the blue ones to clusters in $G_{s_2}$ and the black ones correspond to the unfinished clusters of the solution for $G_t$.
  • Figure 4: The connectivity graph corresponding to the 3-SAT formula $(x_1 \lor x_2 \lor x_3) \land (\overline{x}_2 \lor \overline{x}_3)$. Green vertices are placed at position $1$, blue ones at $2$ and black ones at $\perp$.
  • Figure 5: A subgraph of the connectivity graph with two layers corresponding to the 3-SAT formula $(\overline{x}_1 \lor x_2) \land (\overline{x}_2)$. Blue edges correspond to the clause $(\overline{x}_1 \lor x_2)$, green ones to the clause $(\overline{x}_2)$, magenta ones to the variable $x_1$ and red ones to the variable $x_2$. The vertices and edges in the subgraph $I(p,\{1,2\})$ (with $p \in [2]$) are for the most part not depicted. The complete connectivity graph also contains the vertices $T_3$, $F_3$, $x_{1,3}$, $\overline{x}_{1,3}$, $x_{2,3}$ and $\overline{x}_{2,3}$, as well as $8$ copies of one-layer instances connecting them with the vertices with first coordinate $1$ and $8$ copies connecting them with the vertices with first coordinate $2$.
  • ...and 2 more figures

Theorems & Definitions (67)

  • Definition 1
  • Definition 2: Tree Decomposition
  • Definition 3: Nice Tree Decomposition
  • Lemma 4: Lemma 7.4 in cygan2015parameterized
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Definition 7
  • Definition 7
  • Definition 7
  • ...and 57 more