PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders

Tianyu Xie; Harry Richman; Jiansi Gao; Frederick A. Matsen; Cheng Zhang

PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders

Tianyu Xie, Harry Richman, Jiansi Gao, Frederick A. Matsen, Cheng Zhang

TL;DR

PhyloVAE tackles the challenge of representing and generatively modeling discrete phylogenetic tree topologies by introducing a linear-time encoding from trees to vectors and a non-autoregressive variational autoencoder that leverages learnable topological features. It provides a probabilistic framework where a latent variable z with a standard Gaussian prior explains tree topologies through p(s(τ)|z), while q(z|τ) is inferred from topology-aware embeddings, trained with a multi-sample bound L_K using the reparameterization trick. The approach yields a visualization-friendly latent space and enables high-resolution density estimation of tree topologies, outperforming autoregressive baselines in speed and matching or surpassing existing methods in representation quality. Across simulated and real data, PhyloVAE demonstrates robust latent separation of topology shapes, convergence signals across multiple analyses, and scalable generative modeling on benchmark datasets, with practical implications for phylogenetic placement and comparative analyses.

Abstract

Learning informative representations of phylogenetic tree structures is essential for analyzing evolutionary relationships. Classical distance-based methods have been widely used to project phylogenetic trees into Euclidean space, but they are often sensitive to the choice of distance metric and may lack sufficient resolution. In this paper, we introduce phylogenetic variational autoencoders (PhyloVAEs), an unsupervised learning framework designed for representation learning and generative modeling of tree topologies. Leveraging an efficient encoding mechanism inspired by autoregressive tree topology generation, we develop a deep latent-variable generative model that facilitates fast, parallelized topology generation. PhyloVAE combines this generative model with a collaborative inference model based on learnable topological features, allowing for high-resolution representations of phylogenetic tree samples. Extensive experiments demonstrate PhyloVAE's robust representation learning capabilities and fast generation of phylogenetic tree topologies.

PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders

TL;DR

Abstract

PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (4)