Visualizing Geophylogenies -- Internal and External Labeling with Phylogenetic Tree Constraints
Jonathan Klawitter, Felix Klesen, Joris Y. Scholl, Thomas C. van Dijk, Alexander Zaft
TL;DR
The paper studies how to visualize geophylogenies by optimizing leaf order under two labeling paradigms: internal labeling, which places labels near leaves, and external labeling, which uses leaders to connect leaves to map sites. It develops a leaf-additive framework with a dynamic-programming solution that optimizes internal labeling in $O(n^2)$ time, and proves that external labeling with leader crossings is NP-hard in general, yet allows polynomial-time checks for zero-crossings, an FPT approach for a key parameter, an ILP formulation, and several fast heuristics. The experimental evaluation on synthetic and real-world data demonstrates that ILP solves practical instances quickly for $s$-leaders, while heuristics offer near-optimal performance with minimal runtime, and $po$-leaders show particular promise for reducing crossings. Overall, the work provides a solid algorithmic foundation for geophylogeny drawings, offering practical tools for designers and revealing several rich directions for further exploration in labeling under geographic and phylogenetic constraints.
Abstract
A geophylogeny is a phylogenetic tree (or dendrogram) where each leaf (e.g. biological taxon) has an associated geographic location (site). To clearly visualize a geophylogeny, the tree is typically represented as a crossing-free drawing next to a map. The correspondence between the taxa and the sites is either shown with matching labels on the map (internal labeling) or with leaders that connect each site to the corresponding leaf of the tree (external labeling). In both cases, a good order of the leaves is paramount for understanding the association between sites and taxa. We define several quality measures for internal labeling and give an efficient algorithm for optimizing them. In contrast, minimizing the number of leader crossings in an external labeling is NP-hard. On the positive side, we show that crossing-free instances can be solved in polynomial time and give a fixed-parameter tractable (FPT) algorithm. Furthermore, optimal solutions can be found in a matter of seconds on realistic instances using integer linear programming. Finally, we provide several efficient heuristic algorithms and experimentally show them to be near optimal on real-world and synthetic instances.
