Table of Contents
Fetching ...

α-HMM: A Graphical Model for RNA Folding

Sixiang Zhang, Aaron J. Yang, Liming Cai

TL;DR

This work addresses RNA secondary structure prediction, including challenging pseudoknots, by introducing the arbitrary-order hidden Markov model (α-HMM) that uses a probabilistic influence graph (PIG) to model long-range dependencies between nucleotide events. The approach generalizes conventional HMMs by allowing influences from historically distant states, enabling efficient, DP-based decoding with time complexity $O(n^3)$ (and $O(n^3|S|^2)$ for the implemented DP). The paper provides a concrete instantiation with a 4-state $\alpha_1$-HMM to model stems, loops, and composite stems (parallel, nested, crossing), and outlines a parameter-estimation strategy that ties stem stability to Boltzmann energy, yielding concrete values for $\alpha$, $\delta$, and $\beta$ and a method to compute base-pair odds via $S(x,y)$. The α-HMM framework is shown to subsume SCFG capabilities and is extendable to include stacked base pairs, offering a flexible, expressive, and potentially more general approach for RNA secondary structure prediction with practical implications for understanding RNA biology and designing predictive tools.

Abstract

RNA secondary structure is modeled with the novel arbitrary-order hidden Markov model (α-HMM). The α-HMM extends over the traditional HMM with capability to model stochastic events that may be in influenced by historically distant ones, making it suitable to account for long-range canonical base pairings between nucleotides, which constitute the RNA secondary structure. Unlike previous heavy-weight extensions over HMM, the α-HMM has the flexibility to apply restrictions on how one event may influence another in stochastic processes, enabling efficient prediction of RNA secondary structure including pseudoknots.

α-HMM: A Graphical Model for RNA Folding

TL;DR

This work addresses RNA secondary structure prediction, including challenging pseudoknots, by introducing the arbitrary-order hidden Markov model (α-HMM) that uses a probabilistic influence graph (PIG) to model long-range dependencies between nucleotide events. The approach generalizes conventional HMMs by allowing influences from historically distant states, enabling efficient, DP-based decoding with time complexity (and for the implemented DP). The paper provides a concrete instantiation with a 4-state -HMM to model stems, loops, and composite stems (parallel, nested, crossing), and outlines a parameter-estimation strategy that ties stem stability to Boltzmann energy, yielding concrete values for , , and and a method to compute base-pair odds via . The α-HMM framework is shown to subsume SCFG capabilities and is extendable to include stacked base pairs, offering a flexible, expressive, and potentially more general approach for RNA secondary structure prediction with practical implications for understanding RNA biology and designing predictive tools.

Abstract

RNA secondary structure is modeled with the novel arbitrary-order hidden Markov model (α-HMM). The α-HMM extends over the traditional HMM with capability to model stochastic events that may be in influenced by historically distant ones, making it suitable to account for long-range canonical base pairings between nucleotides, which constitute the RNA secondary structure. Unlike previous heavy-weight extensions over HMM, the α-HMM has the flexibility to apply restrictions on how one event may influence another in stochastic processes, enabling efficient prediction of RNA secondary structure including pseudoknots.
Paper Structure (11 sections, 13 equations, 5 figures, 1 table)

This paper contains 11 sections, 13 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: A schematic illustration of a PIG modeling a simplified subset of English sentences. (a) States are within oval circles which can emit desirable words and phrases (emissions are not shown); solid arrows are transitions, and dotted (colored) arrows are influences. (b) A walk $\rho$ of length 12 starting from state time to generate the observed sentence, with influences labeled among the instantiated states on the walk $\rho$.
  • Figure 2: (A) Secondary structure of tRNA, where canonical pairs of nucleotides are connected with a single dot '.' whereas non-canonical nucleotide interactions are represented by longer dotted lines. The four stems are colored code to correspond the four double helices in its 3D structure in (B). (C) Various structural elements in RNA secondary structure, including stem, hairpin loop, bulge loop, internal loop, multi-loop, and pseudoknot.
  • Figure 3: RNA composite stems. (A) stem-loop, (B) parallel stems, (C) nested stems, and (D) crossing stems. The rules for forming composite stems can repeatedly be applied to any unpaired loop region.
  • Figure 4: A simplified 4-state PIG for RNA secondary structure including pseudoknots. RNA sequences are assumed to be "standard" (with equal base composition for 4 nucleotides), i.e., every state has the same emission probability $\epsilon = 1/4$ for the 4 nucleotides. Three (different) influence functions $\eta_1, \eta_2,$ and $\eta_3$ are given as colored directed edges and their probability "standard" distributions are summarized as the same distribution $\eta$ in the table. Transition function $\tau$ is given as the black directed edges. The values for probability parameters $\alpha$, $\beta$, and $\delta$ are to be determined in section 3.2.
  • Figure 5: Illustration on walk $\rho^{(j)}$ up to step $j$ is built upon established walk $\rho^{(j-1)}$ up to step $j-1$ (gray drawings). (A) Potential new influence to step $j$ from step $l$, as long as the latter has not been a influencer. (B) When state $r$ is affiliated with its predecessor state $s$, the influence from step $l$ to step $j$ needs to be established upon influence from step $l+1$ to step $j-1$, should $j-1$ is influenced in walk $\rho^{(j-1)}$.

Theorems & Definitions (7)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7