An efficient solution to Hidden Markov Models on trees with coupled branches
Farzan Vafa, Sahand Hormoz
TL;DR
The paper addresses Hidden Markov Models on trees with coupled branches, a realistic setting for biological lineages where sister cells share dependencies. It develops a dynamic-programming framework that extends forward-backward and Viterbi-style methods to tree structures with coupling, achieving a complexity of $O(|T|N^{n+1})$ (and $O(TN^3)$ for binary trees) while incorporating scaling to avoid underflow. An EM-based learning procedure estimates $a$, $b$, and $\pi$ with explicit, numerically stable update formulas, and a self-consistency check framework validates model assumptions. The work includes a Python implementation and simulations showing reliable parameter recovery and practical applicability to lineage-like data, enabling more faithful inference of hierarchical biological processes.
Abstract
Hidden Markov Models (HMMs) are powerful tools for modeling sequential data, where the underlying states evolve in a stochastic manner and are only indirectly observable. Traditional HMM approaches are well-established for linear sequences, and have been extended to other structures such as trees. In this paper, we extend the framework of HMMs on trees to address scenarios where the tree-like structure of the data includes coupled branches -- a common feature in biological systems where entities within the same lineage exhibit dependent characteristics. We develop a dynamic programming algorithm that efficiently solves the likelihood, decoding, and parameter learning problems for tree-based HMMs with coupled branches. Our approach scales polynomially with the number of states and nodes, making it computationally feasible for a wide range of applications and does not suffer from the underflow problem. We demonstrate our algorithm by applying it to simulated data and propose self-consistency checks for validating the assumptions of the model used for inference. This work not only advances the theoretical understanding of HMMs on trees but also provides a practical tool for analyzing complex biological data where dependencies between branches cannot be ignored.
