An objective function for order preserving hierarchical clustering

Daniel Bakkelund

An objective function for order preserving hierarchical clustering

Daniel Bakkelund

TL;DR

This work reframes hierarchical clustering to preserve order within probabilistic partial orders and DAGs by augmenting Dasgupta’s similarity-based objective with an order-preservation component. It introduces a relaxed ordered framework using $\omega$, antisymmetrisation $g$, and a combined objective $f=s_d+g$, yielding a bi-objective optimization whose extremes recover pure clustering or pure ordering. The authors prove that optimal trees under special cases are order-preserving, analyze performance under planted partial orders, and demonstrate a polynomial-time approximation with a guarantee of $O(\log^{3/2} n)$ via a directed sparsest-cut approach. A thorough demonstration on a machine-parts dataset shows advantages over existing order-preserving methods, highlighting improved clustering quality while maintaining order constraints. The work contributes a formal definition of order-preserving hierarchical clustering, a concrete objective combining similarity and order, and an actionable approximation algorithm with practical validation and guidance for future theory and implementations.

Abstract

We present a theory and an objective function for similarity-based hierarchical clustering of probabilistic partial orders and directed acyclic graphs (DAGs). Specifically, given elements $x \le y$ in the partial order, and their respective clusters $[x]$ and $[y]$, the theory yields an order relation $\le'$ on the clusters such that $[x]\le'[y]$. The theory provides a concise definition of order-preserving hierarchical clustering, and offers a classification theorem identifying the order-preserving trees (dendrograms). To determine the optimal order-preserving trees, we develop an objective function that frames the problem as a bi-objective optimisation, aiming to satisfy both the order relation and the similarity measure. We prove that the optimal trees under the objective are both order-preserving and exhibit high-quality hierarchical clustering. Since finding an optimal solution is NP-hard, we introduce a polynomial-time approximation algorithm and demonstrate that the method outperforms existing methods for order-preserving hierarchical clustering by a significant margin.

An objective function for order preserving hierarchical clustering

TL;DR

, antisymmetrisation

, and a combined objective

, yielding a bi-objective optimization whose extremes recover pure clustering or pure ordering. The authors prove that optimal trees under special cases are order-preserving, analyze performance under planted partial orders, and demonstrate a polynomial-time approximation with a guarantee of

via a directed sparsest-cut approach. A thorough demonstration on a machine-parts dataset shows advantages over existing order-preserving methods, highlighting improved clustering quality while maintaining order constraints. The work contributes a formal definition of order-preserving hierarchical clustering, a concrete objective combining similarity and order, and an actionable approximation algorithm with practical validation and guidance for future theory and implementations.

Abstract

We present a theory and an objective function for similarity-based hierarchical clustering of probabilistic partial orders and directed acyclic graphs (DAGs). Specifically, given elements

in the partial order, and their respective clusters

and

, the theory yields an order relation

on the clusters such that

. The theory provides a concise definition of order-preserving hierarchical clustering, and offers a classification theorem identifying the order-preserving trees (dendrograms). To determine the optimal order-preserving trees, we develop an objective function that frames the problem as a bi-objective optimisation, aiming to satisfy both the order relation and the similarity measure. We prove that the optimal trees under the objective are both order-preserving and exhibit high-quality hierarchical clustering. Since finding an optimal solution is NP-hard, we introduce a polynomial-time approximation algorithm and demonstrate that the method outperforms existing methods for order-preserving hierarchical clustering by a significant margin.

An objective function for order preserving hierarchical clustering

TL;DR

Abstract

An objective function for order preserving hierarchical clustering

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (51)