Endhered patterns in matchings and RNA
Célia Biane, Greg Hampikian, Sergey Kirgizov, Khaydar Nurligareev
TL;DR
The paper defines endhered patterns in matchings and analyzes their distribution for size-2 and size-3 patterns, establishing equidistribution under endhered twists and deriving recurrences, exponential generating functions, and asymptotics that include a Poisson limit for size-2 patterns. It then compares these theoretical results to native RNA secondary structures with pseudoknots, showing that such patterns are relatively rare in real data and that reducing to RNA shapes clarifies pattern presence. The work highlights a notable gap between unrestricted combinatorial models and empirical RNA patterns, suggesting the need for pattern-based constraints to better capture RNA folding and pseudoknot biology, and outlines directions for extending the approach to more complex patterns and datasets.
Abstract
An endhered (end-adhered) pattern is a subset of arcs in matchings, such that the corresponding starting points are consecutive and the same holds for the ending points. Such patterns are in one-to-one correspondence with the permutations. We focus on the occurrence frequency of such patterns in matchings and native (real-world) RNA structures with pseudoknots. We present combinatorial results related to the distribution and asymptotic behavior of the pattern 21, which corresponds to two consecutive base pairs frequently encountered in RNA, and the pattern 12, representing the archetypal minimal pseudoknot. We show that in matchings these two patterns are equidistributed, which is quite different from what we can find in native RNAs. We also examine the distribution of endhered patterns of size 3, showing how the patterns change under the transformation called endhered twist. Finally, we compute the distributions of endhered patterns of size 2 and 3 in native secondary RNA structures with pseudoknots and discuss possible outcomes of our study.
