Table of Contents
Fetching ...

An incremental exact algorithm for the hyper-rectangular clustering problem with axis-parallel clusters

Diego Delle Donne, Javier Marenco, Eduardo Moreno

TL;DR

This work proposes an adaptive exact strategy which takes advantage of the capacity to solve small instances to optimality of previous approaches and proves that as soon as a solution covers the whole set of point from the instance, then the solution is actually an optimal solution for the original problem.

Abstract

We address the problem of clustering a set of points in $\mathbb{R}^d$ with axis-parallel clusters. Previous exact approaches to this problem are mostly based on integer programming formulations and can only solve to optimality instances of small size. In this work we propose an adaptive exact strategy which takes advantage of the capacity to solve small instances to optimality of previous approaches. Our algorithm starts by solving an instance with a small subset of points and iteratively adds more points if these are not covered by the obtained solution. We prove that as soon as a solution covers the whole set of point from the instance, then the solution is actually an optimal solution for the original problem. We compare the efficiency of the new method against the existing ones with an exhaustive computational experimentation in which we show that the new approach is able to solve to optimality instances of higher orders of magnitude.

An incremental exact algorithm for the hyper-rectangular clustering problem with axis-parallel clusters

TL;DR

This work proposes an adaptive exact strategy which takes advantage of the capacity to solve small instances to optimality of previous approaches and proves that as soon as a solution covers the whole set of point from the instance, then the solution is actually an optimal solution for the original problem.

Abstract

We address the problem of clustering a set of points in with axis-parallel clusters. Previous exact approaches to this problem are mostly based on integer programming formulations and can only solve to optimality instances of small size. In this work we propose an adaptive exact strategy which takes advantage of the capacity to solve small instances to optimality of previous approaches. Our algorithm starts by solving an instance with a small subset of points and iteratively adds more points if these are not covered by the obtained solution. We prove that as soon as a solution covers the whole set of point from the instance, then the solution is actually an optimal solution for the original problem. We compare the efficiency of the new method against the existing ones with an exhaustive computational experimentation in which we show that the new approach is able to solve to optimality instances of higher orders of magnitude.

Paper Structure

This paper contains 9 sections, 2 theorems, 3 equations, 9 figures, 2 algorithms.

Key Result

Lemma 1

Let $\hat{\mathcal{X}} \subseteq \mathcal{X}$ be a subset of points and $\mathbb{C}$ an optimal $p$-clustering of $\hat{\mathcal{X}}$ (i.e., with minimum total span). If $\mathbb{C}$ covers $\mathcal{X}$, then it represents an optimal $p$-clustering for $\mathcal{X}$.

Figures (9)

  • Figure 1: (a) Sample instance with dimension $d=2$ and $n=57$ points, and (b) optimal solution for this instance, with $p=4$ clusters.
  • Figure 2: A small example in $\mathbb{R}^2$ illustrating the difference between a uniform random sample (left) and a sample which identifies points which are most likely to lie in the borders of the hyper-rectangles defining the optimal solution (right). Black points represent the sample $\hat{\mathcal{X}}$ and gray points are the remaining points $\mathcal{X} \setminus \hat{\mathcal{X}}$.
  • Figure 3: Eccentricity (fixed on the horizontal coordinate, $t=1$ in the example) of different points in a cluster as the points are taken closer to the border of the cluster.
  • Figure 4: Example in which a point which lies in the border of a cluster is not detected either by the neighbour metric nor by the eccentricity metric.
  • Figure 5: Processing times on instances in $\mathbb{R}^3$ with $p=4$ clusters, ranging from 40 to 1000 points.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Lemma 1
  • proof
  • Remark 1
  • Proposition 1
  • proof
  • Remark 2
  • Definition 1
  • Definition 2