Table of Contents
Fetching ...

Tropical $k$-means clustering for phylogenetic trees

Fabian Lenzen, Lena Weis

TL;DR

A clustering algorithm for equidistant phylogenetic trees based on the correspondence between B(K_N) and the space of equidistant trees with $N$ leaves is defined and analysed.

Abstract

The asymmetric tropical distance is a distance measure on the tropical torus $\mathbb{R}^n/\mathbb{R}\mathbf{1}$ and in particular on the Bergman fan $B(K_N) \subseteq \mathbb{R}^{\binom{N}{2}}/\mathbb{R}\mathbf{1}$ of the complete graphical matroid. In this paper, we define and analyse a clustering algorithm for equidistant phylogenetic trees based on this distance, using the correspondence between $B(K_N)$ and the space of equidistant trees with $N$ leaves.

Tropical $k$-means clustering for phylogenetic trees

TL;DR

A clustering algorithm for equidistant phylogenetic trees based on the correspondence between B(K_N) and the space of equidistant trees with leaves is defined and analysed.

Abstract

The asymmetric tropical distance is a distance measure on the tropical torus and in particular on the Bergman fan of the complete graphical matroid. In this paper, we define and analyse a clustering algorithm for equidistant phylogenetic trees based on this distance, using the correspondence between and the space of equidistant trees with leaves.
Paper Structure (15 sections, 8 theorems, 46 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 8 theorems, 46 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Lemma 2.4

Let $x,y,z \in \mathbb{R}^{n} / \mathbb{R}\mathbb{1}$, then

Figures (8)

  • Figure 1: Excerpt from the Bergman fan $B(K_{4})$. The excerpt corresponds to the trees that have taxa $a$, $b$, $c$, $d$ in this order. Each cell corresponds to one combinatorial type of such a tree. The edges that join the cells correspond to degenerate trees as shown. The black lines are isolines of $d_{\Delta}(0, -)$, where $0$ is the tree corresponding to the center, and the red lines are the isolines of $d_{\Delta}(-, 0)$.
  • Figure 2: A local optimum for the tropical $2$-median clustering, with centroids $c_1 = v_1$ and $c_2 = v_2$. The points $c^*_1$ and $c^*_2$ correspond to the optimal clustering.
  • Figure 3: Iterations of tropical $k$-means++ clustering of phylogenetic trees on the four taxa $a$, $b$, $c$, $d$. The visualization of the space of such trees corresponds to fig:bergmanfan.
  • Figure 4: Losses of $100$ different tropical $17$-means++ clusterings of the apicomplexa dataset.
  • Figure 5: The set of tropical median consensus trees which are the final centroids of tropical $k$-means clustering for $k = 17$ with the smallest loss in fig:losses.
  • ...and 3 more figures

Theorems & Definitions (27)

  • Example 2.1
  • Remark 2.2
  • Remark 2.3
  • Lemma 2.4: Pseudo-triangle inequality
  • proof
  • Definition 2.5
  • Remark 2.6
  • Remark 2.7
  • Proposition 3.1
  • proof
  • ...and 17 more