Table of Contents
Fetching ...

Variational Bayesian Phylogenetic Inference with Semi-implicit Branch Length Distributions

Tianyu Xie, Frederick A. Matsen, Marc A. Suchard, Cheng Zhang

TL;DR

This work tackles the inefficiency of exploring large, multimodal tree spaces in Bayesian phylogenetics by adopting variational inference with a flexible semi-implicit branch length model. It introduces VBPI-SIBranch, which uses graph neural networks to produce permutation-invariant, semi-implicit branch length posteriors conditioned on topology, and couples this with two surrogate lower bounds, MSILB and MIWLB, to train over both topology and branch lengths. The approach yields improved marginal likelihood estimates and tighter branch-length posterior approximations across eight benchmark datasets, with MIWLB providing the strongest performance in many cases. Overall, the method demonstrates that rich variational families coupled with principled lower-bound surrogates can offer scalable, accurate alternatives to MCMC for phylogenetic inference and opens avenues for conditioning mixing distributions on topology through learned graph representations.

Abstract

Reconstructing the evolutionary history relating a collection of molecular sequences is the main subject of modern Bayesian phylogenetic inference. However, the commonly used Markov chain Monte Carlo methods can be inefficient due to the complicated space of phylogenetic trees, especially when the number of sequences is large. An alternative approach is variational Bayesian phylogenetic inference (VBPI) which transforms the inference problem into an optimization problem. While effective, the default diagonal lognormal approximation for the branch lengths of the tree used in VBPI is often insufficient to capture the complexity of the exact posterior. In this work, we propose a more flexible family of branch length variational posteriors based on semi-implicit hierarchical distributions using graph neural networks. We show that this semi-implicit construction emits straightforward permutation equivariant distributions, and therefore can handle the non-Euclidean branch length space across different tree topologies with ease. To deal with the intractable marginal probability of semi-implicit variational distributions, we develop several alternative lower bounds for stochastic optimization. We demonstrate the effectiveness of our proposed method over baseline methods on benchmark data examples, in terms of both marginal likelihood estimation and branch length posterior approximation.

Variational Bayesian Phylogenetic Inference with Semi-implicit Branch Length Distributions

TL;DR

This work tackles the inefficiency of exploring large, multimodal tree spaces in Bayesian phylogenetics by adopting variational inference with a flexible semi-implicit branch length model. It introduces VBPI-SIBranch, which uses graph neural networks to produce permutation-invariant, semi-implicit branch length posteriors conditioned on topology, and couples this with two surrogate lower bounds, MSILB and MIWLB, to train over both topology and branch lengths. The approach yields improved marginal likelihood estimates and tighter branch-length posterior approximations across eight benchmark datasets, with MIWLB providing the strongest performance in many cases. Overall, the method demonstrates that rich variational families coupled with principled lower-bound surrogates can offer scalable, accurate alternatives to MCMC for phylogenetic inference and opens avenues for conditioning mixing distributions on topology through learned graph representations.

Abstract

Reconstructing the evolutionary history relating a collection of molecular sequences is the main subject of modern Bayesian phylogenetic inference. However, the commonly used Markov chain Monte Carlo methods can be inefficient due to the complicated space of phylogenetic trees, especially when the number of sequences is large. An alternative approach is variational Bayesian phylogenetic inference (VBPI) which transforms the inference problem into an optimization problem. While effective, the default diagonal lognormal approximation for the branch lengths of the tree used in VBPI is often insufficient to capture the complexity of the exact posterior. In this work, we propose a more flexible family of branch length variational posteriors based on semi-implicit hierarchical distributions using graph neural networks. We show that this semi-implicit construction emits straightforward permutation equivariant distributions, and therefore can handle the non-Euclidean branch length space across different tree topologies with ease. To deal with the intractable marginal probability of semi-implicit variational distributions, we develop several alternative lower bounds for stochastic optimization. We demonstrate the effectiveness of our proposed method over baseline methods on benchmark data examples, in terms of both marginal likelihood estimation and branch length posterior approximation.
Paper Structure (37 sections, 4 theorems, 55 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 37 sections, 4 theorems, 55 equations, 7 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

Suppose $\bm{z}=[\bm{z}_e]_{e\in E(\tau)}$ and $\bm{z}_{\pi}=[\bm{z}_{\pi(e)}]_{e\in E(\tau)}$. If $Q_{\bm{\psi}}(\bm{q}|\tau,\bm{z})$ and $Q_{\bm{\psi}}(\bm{z}|\tau)$ in eq:si-dist are permutation invariant, i.e., $Q_{\bm{\psi}}(\bm{q}_{\pi}|\tau,\bm{z}_\pi)=Q_{\bm{\psi}}(\bm{q}|\tau,\bm{z})$, $Q_{

Figures (7)

  • Figure 1: An overview of VBPI-SIBranch for a five-leaf phylogenetic tree. We begin with topological node embeddings Zhang2023learnable (upper left) and apply GNNs to obtain the edge features. These features, joined together with the i.i.d. hidden variables, are finally fed into the $\mathrm{MLP}^\mu$ and $\mathrm{MLP}^\sigma$ to form the parameters of branch length distributions.
  • Figure 2: Visualization of the training processes of different methods for VBPI. Left: evidence lower bound (ELBO, estimated using $J=1000$ extra samples) as a function of iterations on DS1. Middle: 10-sample lower bound (LB-10, estimated using $J=1000$ extra samples) as a function of iterations on DS1. Right: Time cost per 10 training iterations of different methods on a single core of Intel Xeon Platinum 9242 processor. The results are averaged over 100 runs with the standard deviation as the error bar.
  • Figure 3: Inference gaps on tree topologies in the 95% credible set of DS1. The $L(Q_{\bm{\psi}}|\tau)$ refers to the ELBO of the variational approximation, and the $L(Q_{\bm{\psi}^\ast}|\tau)$ refers to the best ELBO that can be achieved by the corresponding variational family. All lower bounds were computed by averaging over 10000 Monte Carlo samples. The ground truth marginal log-likelihood $\log P(\bm{Y}|\tau)$ is estimated using the generalized stepping-stone (GSS) algorithm Fan2010GSS.
  • Figure 4: Branch length approximation accuracy of different methods for VBPI on DS1. Left/Middle: The TV distance and KL divergence between the branch length variational distribution and the ground truth on individual tree topologies. Right: the effective sample size of the importance sampling estimation of $Q_{\bm{\psi}}(\bm{q}|\tau)$ in VBPI-SIBranch. To simplify computation, the TV distance and KL divergence are defined as $\sum_{e\in E(\tau)}D_{\mathrm{TV}}(Q_{\bm{\psi}}(q_e|\tau)\|P(q_e|\tau,\bm{Y}))$ and $\sum_{e\in E(\tau)}D_{\mathrm{KL}}(Q_{\bm{\psi}}(q_e|\tau)\|P(q_e|\tau,\bm{Y}))$, respectively, where one million samples are drawn from each distribution. The ground truth samples are gathered from a long MrBayes run with 4 chains for one billion iterations and sampled every 100 iterations.
  • Figure 5: Selected marginal branch length variational distributions obtained by different methods on tree 36 of DS1. For each method, we estimated the probability density function with one million samples.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Definition 1: Permutation Invariance
  • Proposition 1
  • proof
  • Theorem 1: Identifiability; Zhang2023learnable
  • Theorem 2
  • Theorem 3
  • Definition 2: Subsplit Bayesian Network