Towards Optimal Grammars for RNA Structures
Evarista Onokpasa, Sebastian Wild, Prudence W. H. Wong
TL;DR
This work develops an automated framework to search for optimal stochastic grammars (SRF) that model RNA sequence-structure data for joint compression and ab initio structure prediction. By combining exhaustive search on small SRF grammars with a random-search component and a stochastic RNA Form normal form, the authors demonstrate that a subset of grammars can surpass human-expert designs in compression efficiency. The study provides reference implementations and shows that automatic grammar discovery can yield better-than-expert grammars, motivating an open contest for optimal RNA grammars. Overall, the approach advances compression-driven RNA structure modeling and suggests promising directions for scalable, learning-based grammar discovery.
Abstract
In past work (Onokpasa, Wild, Wong, DCC 2023), we showed that (a) for joint compression of RNA sequence and structure, stochastic context-free grammars are the best known compressors and (b) that grammars which have better compression ability also show better performance in ab initio structure prediction. Previous grammars were manually curated by human experts. In this work, we develop a framework for automatic and systematic search algorithms for stochastic grammars with better compression (and prediction) ability for RNA. We perform an exhaustive search of small grammars and identify grammars that surpass the performance of human-expert grammars.
