α-HMM: A Graphical Model for RNA Folding
Sixiang Zhang, Aaron J. Yang, Liming Cai
TL;DR
This work addresses RNA secondary structure prediction, including challenging pseudoknots, by introducing the arbitrary-order hidden Markov model (α-HMM) that uses a probabilistic influence graph (PIG) to model long-range dependencies between nucleotide events. The approach generalizes conventional HMMs by allowing influences from historically distant states, enabling efficient, DP-based decoding with time complexity $O(n^3)$ (and $O(n^3|S|^2)$ for the implemented DP). The paper provides a concrete instantiation with a 4-state $\alpha_1$-HMM to model stems, loops, and composite stems (parallel, nested, crossing), and outlines a parameter-estimation strategy that ties stem stability to Boltzmann energy, yielding concrete values for $\alpha$, $\delta$, and $\beta$ and a method to compute base-pair odds via $S(x,y)$. The α-HMM framework is shown to subsume SCFG capabilities and is extendable to include stacked base pairs, offering a flexible, expressive, and potentially more general approach for RNA secondary structure prediction with practical implications for understanding RNA biology and designing predictive tools.
Abstract
RNA secondary structure is modeled with the novel arbitrary-order hidden Markov model (α-HMM). The α-HMM extends over the traditional HMM with capability to model stochastic events that may be in influenced by historically distant ones, making it suitable to account for long-range canonical base pairings between nucleotides, which constitute the RNA secondary structure. Unlike previous heavy-weight extensions over HMM, the α-HMM has the flexibility to apply restrictions on how one event may influence another in stochastic processes, enabling efficient prediction of RNA secondary structure including pseudoknots.
