Table of Contents
Fetching ...

Visualizing Geophylogenies -- Internal and External Labeling with Phylogenetic Tree Constraints

Jonathan Klawitter, Felix Klesen, Joris Y. Scholl, Thomas C. van Dijk, Alexander Zaft

TL;DR

The paper studies how to visualize geophylogenies by optimizing leaf order under two labeling paradigms: internal labeling, which places labels near leaves, and external labeling, which uses leaders to connect leaves to map sites. It develops a leaf-additive framework with a dynamic-programming solution that optimizes internal labeling in $O(n^2)$ time, and proves that external labeling with leader crossings is NP-hard in general, yet allows polynomial-time checks for zero-crossings, an FPT approach for a key parameter, an ILP formulation, and several fast heuristics. The experimental evaluation on synthetic and real-world data demonstrates that ILP solves practical instances quickly for $s$-leaders, while heuristics offer near-optimal performance with minimal runtime, and $po$-leaders show particular promise for reducing crossings. Overall, the work provides a solid algorithmic foundation for geophylogeny drawings, offering practical tools for designers and revealing several rich directions for further exploration in labeling under geographic and phylogenetic constraints.

Abstract

A geophylogeny is a phylogenetic tree (or dendrogram) where each leaf (e.g. biological taxon) has an associated geographic location (site). To clearly visualize a geophylogeny, the tree is typically represented as a crossing-free drawing next to a map. The correspondence between the taxa and the sites is either shown with matching labels on the map (internal labeling) or with leaders that connect each site to the corresponding leaf of the tree (external labeling). In both cases, a good order of the leaves is paramount for understanding the association between sites and taxa. We define several quality measures for internal labeling and give an efficient algorithm for optimizing them. In contrast, minimizing the number of leader crossings in an external labeling is NP-hard. On the positive side, we show that crossing-free instances can be solved in polynomial time and give a fixed-parameter tractable (FPT) algorithm. Furthermore, optimal solutions can be found in a matter of seconds on realistic instances using integer linear programming. Finally, we provide several efficient heuristic algorithms and experimentally show them to be near optimal on real-world and synthetic instances.

Visualizing Geophylogenies -- Internal and External Labeling with Phylogenetic Tree Constraints

TL;DR

The paper studies how to visualize geophylogenies by optimizing leaf order under two labeling paradigms: internal labeling, which places labels near leaves, and external labeling, which uses leaders to connect leaves to map sites. It develops a leaf-additive framework with a dynamic-programming solution that optimizes internal labeling in time, and proves that external labeling with leader crossings is NP-hard in general, yet allows polynomial-time checks for zero-crossings, an FPT approach for a key parameter, an ILP formulation, and several fast heuristics. The experimental evaluation on synthetic and real-world data demonstrates that ILP solves practical instances quickly for -leaders, while heuristics offer near-optimal performance with minimal runtime, and -leaders show particular promise for reducing crossings. Overall, the work provides a solid algorithmic foundation for geophylogeny drawings, offering practical tools for designers and revealing several rich directions for further exploration in labeling under geographic and phylogenetic constraints.

Abstract

A geophylogeny is a phylogenetic tree (or dendrogram) where each leaf (e.g. biological taxon) has an associated geographic location (site). To clearly visualize a geophylogeny, the tree is typically represented as a crossing-free drawing next to a map. The correspondence between the taxa and the sites is either shown with matching labels on the map (internal labeling) or with leaders that connect each site to the corresponding leaf of the tree (external labeling). In both cases, a good order of the leaves is paramount for understanding the association between sites and taxa. We define several quality measures for internal labeling and give an efficient algorithm for optimizing them. In contrast, minimizing the number of leader crossings in an external labeling is NP-hard. On the positive side, we show that crossing-free instances can be solved in polynomial time and give a fixed-parameter tractable (FPT) algorithm. Furthermore, optimal solutions can be found in a matter of seconds on realistic instances using integer linear programming. Finally, we provide several efficient heuristic algorithms and experimentally show them to be near optimal on real-world and synthetic instances.
Paper Structure (38 sections, 6 theorems, 10 equations, 28 figures)

This paper contains 38 sections, 6 theorems, 10 equations, 28 figures.

Key Result

Theorem 1

Let $G$ be a geophylogeny with $n$ taxa and let $f$ be a leaf additive quality measure. A drawing $\Gamma$ with internal labeling of $G$ that minimizes (or maximizes) $f$ can be computed in $\mathcal{O}(n^2)$ time and $\mathcal{O}(n^2)$ space.

Figures (28)

  • Figure 1: To visualize this geophylogeny of the five present-day kiwi species (Tokoeka/South Island Brown Kiwi -- Apteryx australis, Rowi/Okarito Brown Kiwi -- A. rowi, North Island Brown Kiwi -- A. mantelli, Great Spotted Kiwi -- A. haastii, Little Spotted Kiwi -- A. owenii), we combine the phylogenetic tree (a) together with the distribution map (b) into a single figure (c). To this end, we may pick a rotation of the map and a placement of the tree as well as a leaf order that facilities easy association based on the colors between the leaves and the features on the map. (Phylogeny and map inspired by Weir et al. exampleKiwi.)
  • Figure 2: Side-by-side drawings of geophylogenies from the literature.
  • Figure 3: Overlay drawings of geophylogenies from the literature.
  • Figure 4: In a drawing of a geophylogeny $G$, we place $T$ above $R$ and use either internal or external labeling to show the mapping between $P$ and $L(T)$. Figures (b) and (c) minimize the number of crossings for their leader type. Note the difference in embedding of $T$ and that not all permutations of leaves are possible.
  • Figure 5: Orange arrows indicate what the three quality measures for internal labeling consider.
  • ...and 23 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Proposition 3
  • Proposition 4
  • Lemma 1
  • Theorem 5