Table of Contents
Fetching ...

Path Signatures Enable Model-Free Mapping of RNA Modifications

Maud Lemercier, Paola Arrubarrena, Salvatore Di Giorgio, Julia Brettschneider, Thomas Cass, Valerie Griesche, Isabel S. Naarmann-de Vries, Anastasia Papavasiliou, Alessia Ruggieri, Irem Tellioglu, Chia Ching Wu, F. Nina Papavasiliou, Terry Lyons

TL;DR

This work introduces a model-free computational method that reframes modification detection as an anomaly detection problem, requiring only canonical (unmodified) RNA reads without any other annotated data, and applies this framework to dengue virus transcripts and mammalian mRNAs.

Abstract

Detecting chemical modifications on RNA molecules remains a key challenge in epitranscriptomics. Traditional reverse transcription-based sequencing methods introduce enzyme- and sequence-dependent biases and fragment RNA molecules, confounding the accurate mapping of modifications across the transcriptome. Nanopore direct RNA sequencing offers a powerful alternative by preserving native RNA molecules, enabling the detection of modifications at single-molecule resolution. However, current computational tools can identify only a limited subset of modification types within well-characterized sequence contexts for which ample training data exists. Here, we introduce a model-free computational method that reframes modification detection as an anomaly detection problem, requiring only canonical (unmodified) RNA reads without any other annotated data. For each nanopore read, our approach extracts robust, modification-sensitive features from the raw ionic current signal at a site using the signature transform, then computes an anomaly score by comparing the resulting feature vector to its nearest neighbors in an unmodified reference dataset. We convert anomaly scores into statistical p-values to enable anomaly detection at both individual read and site levels. Validation on densely-modified \textit{E. coli} rRNA demonstrates that our approach detects known sites harboring diverse modification types, without prior training on these modifications. We further applyied this framework to dengue virus (DENV) transcripts and mammalian mRNAs. For DENV sfRNA, it led to revealing a novel 2'-O-methylated site, which we validate orthogonally by qRT-PCR assays. These results demonstrate that our model-free approach operates robustly across different types of RNAs and datasets generated with different nanopore sequencing chemistries.

Path Signatures Enable Model-Free Mapping of RNA Modifications

TL;DR

This work introduces a model-free computational method that reframes modification detection as an anomaly detection problem, requiring only canonical (unmodified) RNA reads without any other annotated data, and applies this framework to dengue virus transcripts and mammalian mRNAs.

Abstract

Detecting chemical modifications on RNA molecules remains a key challenge in epitranscriptomics. Traditional reverse transcription-based sequencing methods introduce enzyme- and sequence-dependent biases and fragment RNA molecules, confounding the accurate mapping of modifications across the transcriptome. Nanopore direct RNA sequencing offers a powerful alternative by preserving native RNA molecules, enabling the detection of modifications at single-molecule resolution. However, current computational tools can identify only a limited subset of modification types within well-characterized sequence contexts for which ample training data exists. Here, we introduce a model-free computational method that reframes modification detection as an anomaly detection problem, requiring only canonical (unmodified) RNA reads without any other annotated data. For each nanopore read, our approach extracts robust, modification-sensitive features from the raw ionic current signal at a site using the signature transform, then computes an anomaly score by comparing the resulting feature vector to its nearest neighbors in an unmodified reference dataset. We convert anomaly scores into statistical p-values to enable anomaly detection at both individual read and site levels. Validation on densely-modified \textit{E. coli} rRNA demonstrates that our approach detects known sites harboring diverse modification types, without prior training on these modifications. We further applyied this framework to dengue virus (DENV) transcripts and mammalian mRNAs. For DENV sfRNA, it led to revealing a novel 2'-O-methylated site, which we validate orthogonally by qRT-PCR assays. These results demonstrate that our model-free approach operates robustly across different types of RNAs and datasets generated with different nanopore sequencing chemistries.

Paper Structure

This paper contains 26 sections, 13 equations, 11 figures.

Figures (11)

  • Figure 1: Schematic of the workflow. (a) Schematic of an RNA molecule threading through a nanopore. (b) Example of an ionic current time series from a nanopore read aligned to a reference sequence. (c) 2D visualization of the signature embeddings of IVT and modified current stretches using UMAP. (d) Corresponding ionic current stretches. (e) Densities estimated with the corresponding anomaly scores. (f) Visualization of p-values per-site and per-read. (g) IGV visualization of a BedGraph file recording per-site modification inferences, such as the frequency of anomalous reads at a predefined significance threshold and the site-level p-value combining the p-values across reads at a site (h) BED file format. n_anom@0.01: number of anomalies detected by thresholding the conformal p-values at $0.1$; Fisher_pval: Fisher's combination test p-value (testing the hypothesis that no read at a site is anomalous), where the underlying test statistic combines the n_test conformal p-values, after adjustment.
  • Figure 2: Evaluation on ribosomal RNA modifications in E. coli 23S. (a) KS test p-values comparing the native and ivt score distributions at each site. (b) AUROC values quantifying the performance of the anomaly detector. (c) Percentage of reads with a score exceeding the 0.90 quantile of the calibration scores. (d) Single-molecule (conformal) p-values with FDR control. At a site, the conformal p-values are thresholded at the BH cutoff, further corrected with Storey's estimate $\hat{\pi}_0$ of the proportion of non-anomalous reads (e) Per-site Fisher's combination test with FDR control at level $0.05$ with . The heatmap shows the Fisher p-values (light gray values are non-significant). (f) Anomaly maps obtained under different multiple testing corrections (yellow dots indicate discoveries). From left to right: using conformal p-values for each read–site pair thresholded at $0.01$; conformal p-values thresholded at the BH-adjusted level; per-site BH correction applied to conformal p-values; Storey’s procedure applied.
  • Figure 3: Detecting modified sites with low stoichiometry. (a) Density plots of IVT and native scores at four modified sites, showing the strongest shift at the last site. (b) Thresholded conformal p-values around the site harboring m7G for different values of n_cal and modification level $x$. From left to right: $x=0.1, 0.2, 0.3, 1$. The red dotted line separates native from ivt reads.
  • Figure 4: DENV gRNA. (a) A 400-nt region around site 1218, previously reported as an m5C site Wu2025.03.17.643699. Three native reads show elevated anomaly scores at this site, with conformal p-values passing a per-site BH correction at 10% FDR. (b) Distribution of at site 1218 for 1,600 IVT calibration reads and 50 native test reads. The two distributions are almost identical, except for the three high-scoring native reads in the tail. (c) Validation in a synthetic dengue virus (DENV) oligonucleotide bearing m5C, analyzed with the same per-site BH correction at 10% FDR. (d) Distribution of for the oligo dataset at site 40, based on 5,000 IVT calibration reads and 2,500 native reads.
  • Figure 5: DENV sfRNA results. Heatmap showing all reads, with conformal p-values thresholded using per-site BH correction at 40% FDR. A clear signal can be observed around position 10,600. (b) qRT-PCR validation of 2'-O-methylation in DENV sfRNA. Amplification curves from qRT-PCR under low and high dNTP conditions. Left: In vitro transcribed (IVT) sfRNA shows minimal Cq shift, consistent with absence of 2'OMe. Right: sfRNA from DENV-infected cells exhibits a pronounced Cq shift under low dNTP conditions, indicative of a 2'OMe near position 10,600.
  • ...and 6 more figures