Table of Contents
Fetching ...

Clustering to Minimize Cluster-Aware Norm Objectives

Martin G. Herold, Evangelos Kipouridis, Joachim Spoerhase

TL;DR

A constant-factor approximation algorithm is designed for cluster-aware objectives including Min-Sum of Radii and Min-Load $k-Clustering where the outer norm is a convex combination of $\textsf{top}_\ell$ norms (ordered weighted norm).

Abstract

We initiate the study of the following general clustering problem. We seek to partition a given set $P$ of data points into $k$ clusters by finding a set $X$ of $k$ centers and assigning each data point to one of the centers. The cost of a cluster, represented by a center $x\in X$, is a monotone, symmetric norm $f$ (inner norm) of the vector of distances of points assigned to $x$. The goal is to minimize a norm $g$ (outer norm) of the vector of cluster costs. This problem, which we call $(f,g)$-Clustering, generalizes many fundamental clustering problems such as $k$-Center, $k$-Median , Min-Sum of Radii, and Min-Load $k$-Clustering . A recent line of research (Chakrabarty, Swamy [STOC'19]) studies norm objectives that are oblivious to the cluster structure such as $k$-Median and $k$-Center. In contrast, our problem models cluster-aware objectives including Min-Sum of Radii and Min-Load $k$-Clustering. Our main results are as follows. First, we design a constant-factor approximation algorithm for $(\textsf{top}_\ell,\mathcal{L}_1)$-Clustering where the inner norm ($\textsf{top}_\ell$) sums over the $\ell$ largest distances. Second, we design a constant-factor approximation\ for $(\mathcal{L}_\infty,\textsf{Ord})$-Clustering where the outer norm is a convex combination of $\textsf{top}_\ell$ norms (ordered weighted norm).

Clustering to Minimize Cluster-Aware Norm Objectives

TL;DR

A constant-factor approximation algorithm is designed for cluster-aware objectives including Min-Sum of Radii and Min-Load \textsf{top}_\ell$ norms (ordered weighted norm).

Abstract

We initiate the study of the following general clustering problem. We seek to partition a given set of data points into clusters by finding a set of centers and assigning each data point to one of the centers. The cost of a cluster, represented by a center , is a monotone, symmetric norm (inner norm) of the vector of distances of points assigned to . The goal is to minimize a norm (outer norm) of the vector of cluster costs. This problem, which we call -Clustering, generalizes many fundamental clustering problems such as -Center, -Median , Min-Sum of Radii, and Min-Load -Clustering . A recent line of research (Chakrabarty, Swamy [STOC'19]) studies norm objectives that are oblivious to the cluster structure such as -Median and -Center. In contrast, our problem models cluster-aware objectives including Min-Sum of Radii and Min-Load -Clustering. Our main results are as follows. First, we design a constant-factor approximation algorithm for -Clustering where the inner norm () sums over the largest distances. Second, we design a constant-factor approximation\ for -Clustering where the outer norm is a convex combination of norms (ordered weighted norm).

Paper Structure

This paper contains 39 sections, 26 theorems, 68 equations, 7 figures, 1 table.

Key Result

Lemma 3.1

Let $\mathcal{I} =(P,F,\delta,k,\textnormal{$\textsf{top}_\ell(\cdot)$},\mathcal{L}_{1})$ an instance of $(\textnormal{Top},\mathcal{L}_{1})$-Clustering. Then the instance $\mathcal{I}'=(P,F,\delta,k,\rho =\ell)$ of Ball $k$-Median satisfies the following two properties.

Figures (7)

  • Figure 1: A dataset of two clusters generated by two different $2$D Gaussians, on which we run $k$-Median, Min-Sum of Radii, and $(\textnormal{$\textsf{top}_8(\cdot)$},\mathcal{L}_{1})$-Clustering. In the $k$-Median solution, points from the large cluster end up in the small cluster, while the opposite happens for Min-Sum of Radii. Only $(\textnormal{$\textsf{top}_8(\cdot)$},\mathcal{L}_{1})$-Clustering recovers the original clusters. In the fourth plot, the radii of the balls signify the $\ell$-th largest distance ($\ell=8$) in their respective clusters; these balls also directly relate to Ball $k$-Median (see \ref{['sec:topl']}).
  • Figure 2: LP for facility-location Ball $k$-median and predetermined $(T,\hat{r})$.
  • Figure 3: Dual-LP for Figure \ref{['fig:FLLP']}.
  • Figure 4: LP to decide which facilities from $X_1,X_2$ to open.
  • Figure 5: LP for FL-MSRDC.
  • ...and 2 more figures

Theorems & Definitions (65)

  • Definition 2.1: Nested Norm $k$-Clustering
  • Definition 2.2: $(I,O)$-Clustering
  • Definition 2.3: top-$\ell$ norm
  • Definition 2.4: ordered norm
  • Definition 2.5
  • Definition 3.1: Ball $k$-Median
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • ...and 55 more