Table of Contents
Fetching ...

Rethinking Performance Measures of RNA Secondary Structure Problems

Frederic Runge, Jörg K. H. Franke, Daniel Fertmann, Frank Hutter

TL;DR

The paper tackles the misalignment between conventional RNA secondary structure evaluation metrics and the structural biology they aim to capture. It proposes the Weisfeiler-Lehman graph kernel as a graph-based distance metric to assess RNA predictions, addressing gaps left by F1 and MCC. Through benchmark evaluation, structural shift and bulge migration analyses, and RNA design experiments, WL is shown to provide more informative, sequence-aware similarity assessments and to guide design improvements. While WL has limitations (e.g., not capturing base-stacking), extending it with edge weights and leveraging graph neural network surrogates could further enhance differentiable training and evaluation in RNA biology.

Abstract

Accurate RNA secondary structure prediction is vital for understanding cellular regulation and disease mechanisms. Deep learning (DL) methods have surpassed traditional algorithms by predicting complex features like pseudoknots and multi-interacting base pairs. However, traditional distance measures can hardly deal with such tertiary interactions and the currently used evaluation measures (F1 score, MCC) have limitations. We propose the Weisfeiler-Lehman graph kernel (WL) as an alternative metric. Embracing graph-based metrics like WL enables fair and accurate evaluation of RNA structure prediction algorithms. Further, WL provides informative guidance, as demonstrated in an RNA design experiment.

Rethinking Performance Measures of RNA Secondary Structure Problems

TL;DR

The paper tackles the misalignment between conventional RNA secondary structure evaluation metrics and the structural biology they aim to capture. It proposes the Weisfeiler-Lehman graph kernel as a graph-based distance metric to assess RNA predictions, addressing gaps left by F1 and MCC. Through benchmark evaluation, structural shift and bulge migration analyses, and RNA design experiments, WL is shown to provide more informative, sequence-aware similarity assessments and to guide design improvements. While WL has limitations (e.g., not capturing base-stacking), extending it with edge weights and leveraging graph neural network surrogates could further enhance differentiable training and evaluation in RNA biology.

Abstract

Accurate RNA secondary structure prediction is vital for understanding cellular regulation and disease mechanisms. Deep learning (DL) methods have surpassed traditional algorithms by predicting complex features like pseudoknots and multi-interacting base pairs. However, traditional distance measures can hardly deal with such tertiary interactions and the currently used evaluation measures (F1 score, MCC) have limitations. We propose the Weisfeiler-Lehman graph kernel (WL) as an alternative metric. Embracing graph-based metrics like WL enables fair and accurate evaluation of RNA structure prediction algorithms. Further, WL provides informative guidance, as demonstrated in an RNA design experiment.
Paper Structure (20 sections, 3 equations, 5 figures, 1 table)

This paper contains 20 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Bi-stable RNA 20mer. The F1 score and MCC when comparing both folds is $0.0$ and $-0.026$, respectively. The Weisfeiler-Lehman graph kernel provides a score of $0.25$.
  • Figure 2: Example of structural shift. (Left) We show a 5SrRNA of Drosophila melanogaster (Middle) The same structure shifted by one position. (Right) The same structure shifted by two positions.
  • Figure 3: RNA Design guided by Hamming distance (libLEARNA) or WL (libLEARNA-WL).
  • Figure 4: Bulge migration example. We show an example of a simulated bulge migration process on a synthetic theophylline riboswitch construct RS3 proposed by wachsmuth_2012. Top left shows the original construct. With each step, the bulge in the right stem is moving by one position.
  • Figure 5: Mutation Example. We show an example of a simulated mutation process on a synthetic theophylline riboswitch construct RS3 proposed by wachsmuth_2012. Top left shows the original construct. With each step (left to right, top to bottom), we introduce the following mutations: 1 base pair (bp) mutated, 2 bp, 4 bp, 8 bp, entire first stem, entire second stem, entire sequence to 'A'.