Hyb-Adam-UM: hybrid ultrametric-aware mtDNA phylogeny reconstruction
Dmitrii Chaikovskii, Weilai Qu, Boris Melnikov, Ye Zhang, Yuehong Zhao
TL;DR
This work tackles incomplete mtDNA distance matrices used for distance-based phylogeny by enforcing ultrametric-consistent geometry during completion. It introduces Hyb-Adam-UM, a two-stage method that first builds an alignment-backed distance backbone via Needleman–Wunsch distances and then refines missing entries by minimizing a robust triplet ultrametric-violation score $\Delta(D)$ using an Adam-style finite-difference optimizer, while keeping observed distances fixed. The key contribution is the $\Delta(D)$ objective and its practical optimization, enabling improved ultrametric consistency, topology, and branch-length agreement, especially at high missingness (up to 85%). Experiments on $15\times 15$ mtDNA matrices show Hyb-Adam-UM reduces ultrametric violations and achieves competitive or superior performance compared with projection-based and low-rank baselines, with strong gains in the ultra-sparse regime. The method is implemented and available online, promoting robust mtDNA phylogeny in settings with limited alignment budgets.
Abstract
Motivation: mtDNA distance matrices are standard inputs for distance-based phylogeny, but computing all pairwise alignments is costly. Missing entries can degrade inferred topology and branch lengths, and generic matrix-completion methods may disrupt tree-like (ultrametric) structure. Results: We propose Hyb-Adam-UM, which starts from an alignment-limited Needleman-Wunsch distance backbone and completes the matrix by minimizing a robust triplet ultrametric-violation functional. An Adam-style finite-difference optimizer updates only missing entries while enforcing symmetry, non-negativity, and a zero diagonal. From one complete reference matrix, we generate 20 masked instances at 30%, 50%, 65%, and 85% missingness. Hyb-Adam-UM consistently reduces ultrametric violations and achieves competitive reconstruction error, with improved topological accuracy and branch-length agreement relative to MW*/NJ* projection baselines (which exactly preserve observed distances) and Soft-Impute; gains are most pronounced at 85% missingness. Availability and implementation: https://github.com/mitichya/hyb-adam-um/; Zenodo: https://doi.org/10.5281/zenodo.18609748 Supplementary information: Supplementary data available online.
