Table of Contents
Fetching ...

Hyb-Adam-UM: hybrid ultrametric-aware mtDNA phylogeny reconstruction

Dmitrii Chaikovskii, Weilai Qu, Boris Melnikov, Ye Zhang, Yuehong Zhao

TL;DR

This work tackles incomplete mtDNA distance matrices used for distance-based phylogeny by enforcing ultrametric-consistent geometry during completion. It introduces Hyb-Adam-UM, a two-stage method that first builds an alignment-backed distance backbone via Needleman–Wunsch distances and then refines missing entries by minimizing a robust triplet ultrametric-violation score $\Delta(D)$ using an Adam-style finite-difference optimizer, while keeping observed distances fixed. The key contribution is the $\Delta(D)$ objective and its practical optimization, enabling improved ultrametric consistency, topology, and branch-length agreement, especially at high missingness (up to 85%). Experiments on $15\times 15$ mtDNA matrices show Hyb-Adam-UM reduces ultrametric violations and achieves competitive or superior performance compared with projection-based and low-rank baselines, with strong gains in the ultra-sparse regime. The method is implemented and available online, promoting robust mtDNA phylogeny in settings with limited alignment budgets.

Abstract

Motivation: mtDNA distance matrices are standard inputs for distance-based phylogeny, but computing all pairwise alignments is costly. Missing entries can degrade inferred topology and branch lengths, and generic matrix-completion methods may disrupt tree-like (ultrametric) structure. Results: We propose Hyb-Adam-UM, which starts from an alignment-limited Needleman-Wunsch distance backbone and completes the matrix by minimizing a robust triplet ultrametric-violation functional. An Adam-style finite-difference optimizer updates only missing entries while enforcing symmetry, non-negativity, and a zero diagonal. From one complete reference matrix, we generate 20 masked instances at 30%, 50%, 65%, and 85% missingness. Hyb-Adam-UM consistently reduces ultrametric violations and achieves competitive reconstruction error, with improved topological accuracy and branch-length agreement relative to MW*/NJ* projection baselines (which exactly preserve observed distances) and Soft-Impute; gains are most pronounced at 85% missingness. Availability and implementation: https://github.com/mitichya/hyb-adam-um/; Zenodo: https://doi.org/10.5281/zenodo.18609748 Supplementary information: Supplementary data available online.

Hyb-Adam-UM: hybrid ultrametric-aware mtDNA phylogeny reconstruction

TL;DR

This work tackles incomplete mtDNA distance matrices used for distance-based phylogeny by enforcing ultrametric-consistent geometry during completion. It introduces Hyb-Adam-UM, a two-stage method that first builds an alignment-backed distance backbone via Needleman–Wunsch distances and then refines missing entries by minimizing a robust triplet ultrametric-violation score using an Adam-style finite-difference optimizer, while keeping observed distances fixed. The key contribution is the objective and its practical optimization, enabling improved ultrametric consistency, topology, and branch-length agreement, especially at high missingness (up to 85%). Experiments on mtDNA matrices show Hyb-Adam-UM reduces ultrametric violations and achieves competitive or superior performance compared with projection-based and low-rank baselines, with strong gains in the ultra-sparse regime. The method is implemented and available online, promoting robust mtDNA phylogeny in settings with limited alignment budgets.

Abstract

Motivation: mtDNA distance matrices are standard inputs for distance-based phylogeny, but computing all pairwise alignments is costly. Missing entries can degrade inferred topology and branch lengths, and generic matrix-completion methods may disrupt tree-like (ultrametric) structure. Results: We propose Hyb-Adam-UM, which starts from an alignment-limited Needleman-Wunsch distance backbone and completes the matrix by minimizing a robust triplet ultrametric-violation functional. An Adam-style finite-difference optimizer updates only missing entries while enforcing symmetry, non-negativity, and a zero diagonal. From one complete reference matrix, we generate 20 masked instances at 30%, 50%, 65%, and 85% missingness. Hyb-Adam-UM consistently reduces ultrametric violations and achieves competitive reconstruction error, with improved topological accuracy and branch-length agreement relative to MW*/NJ* projection baselines (which exactly preserve observed distances) and Soft-Impute; gains are most pronounced at 85% missingness. Availability and implementation: https://github.com/mitichya/hyb-adam-um/; Zenodo: https://doi.org/10.5281/zenodo.18609748 Supplementary information: Supplementary data available online.
Paper Structure (15 sections, 13 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 15 sections, 13 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: Illustration of triangle relations arising from a symmetric distance matrix.
  • Figure 2: Triangle with side lengths $a,b,c$ and angles $\alpha,\beta,\gamma$.
  • Figure 3: Heatmaps of the reference distance matrix $D_{\mathrm{ref}}$ and the reconstructed matrices for a representative replicate at $p=50\%$ missingness.
  • Figure 4: Neighbor-Joining trees inferred from the reference matrix $D_{\mathrm{ref}}$ and from the reconstructed matrices shown in Figure \ref{['fig:heatmap_p50']} (same representative replicate at $p=50\%$ missingness).
  • Figure 5: Optimization dynamics of Hyb-Adam-UM at $p=50\%$ missingness: $\Delta(D_t)$ versus epoch for five masking replicates (lines). Circles with error bars indicate mean $\pm$ SD across replicates at selected epochs. The dashed horizontal line shows the baseline $\Delta(D_{\mathrm{ref}})=105.474$ for the reference complete matrix.