Table of Contents
Fetching ...

A Principal Submanifold-based Approach for Clustering and Multiscale RNA Correction

Menghao Wu, Zhigang Yao

TL;DR

This work proposes a novel clustering technique, the principal submanifold-based DBSCAN (PSM-DBSCAN), which achieves superior clustering accuracy and increased robustness to noise, and applies this new method for multiscale corrections, effectively resolving RNA backbone clashes at both microscopic and mesoscopic scales.

Abstract

RNA structure determination is essential for understanding its biological functions. However, the reconstruction process often faces challenges, such as atomic clashes, which can lead to inaccurate models. To address these challenges, we introduce the principal submanifold (PSM) approach for analyzing RNA data on a torus. This method provides an accurate, low-dimensional feature representation, overcoming the limitations of previous torus-based methods. By combining PSM with DBSCAN, we propose a novel clustering technique, the principal submanifold-based DBSCAN (PSM-DBSCAN). Our approach achieves superior clustering accuracy and increased robustness to noise. Additionally, we apply this new method for multiscale corrections, effectively resolving RNA backbone clashes at both microscopic and mesoscopic scales. Extensive simulations and comparative studies highlight the enhanced precision and scalability of our method, demonstrating significant improvements over existing approaches. The proposed methodology offers a robust foundation for correcting complex RNA structures and has broad implications for applications in structural biology and bioinformatics.

A Principal Submanifold-based Approach for Clustering and Multiscale RNA Correction

TL;DR

This work proposes a novel clustering technique, the principal submanifold-based DBSCAN (PSM-DBSCAN), which achieves superior clustering accuracy and increased robustness to noise, and applies this new method for multiscale corrections, effectively resolving RNA backbone clashes at both microscopic and mesoscopic scales.

Abstract

RNA structure determination is essential for understanding its biological functions. However, the reconstruction process often faces challenges, such as atomic clashes, which can lead to inaccurate models. To address these challenges, we introduce the principal submanifold (PSM) approach for analyzing RNA data on a torus. This method provides an accurate, low-dimensional feature representation, overcoming the limitations of previous torus-based methods. By combining PSM with DBSCAN, we propose a novel clustering technique, the principal submanifold-based DBSCAN (PSM-DBSCAN). Our approach achieves superior clustering accuracy and increased robustness to noise. Additionally, we apply this new method for multiscale corrections, effectively resolving RNA backbone clashes at both microscopic and mesoscopic scales. Extensive simulations and comparative studies highlight the enhanced precision and scalability of our method, demonstrating significant improvements over existing approaches. The proposed methodology offers a robust foundation for correcting complex RNA structures and has broad implications for applications in structural biology and bioinformatics.

Paper Structure

This paper contains 25 sections, 36 equations, 14 figures, 4 tables, 1 algorithm.

Figures (14)

  • Figure 1: Visualization of the PSM and tPCA. RNA backbone dihedral-angle data are intrinsically torus-valued. The figure contrasts two low-dimensional fitting strategies: PSM directly fits a low-dimensional principal submanifold on the torus, yielding an intrinsic low-dimensional structure; tPCA maps the torus data (e.g., via a TOSS mapping) to the sphere for low-dimensional fitting (e.g., PNS) and maps back, extracting a corresponding low-dimensional fitted structure on the torus.
  • Figure 2: An intuitive illustration of clustering performance on the high-dimensional data using t-SNE. (a) shows the visualization of three different categories of points, (b) shows the clustering performance of PSM-DBSCAN and (c) shows MINT-AGE.
  • Figure 3: Illustration of the dihedral angles in the RNA backbone between the atom. Reproduced from mardia2013statistical.
  • Figure 4: Visualization of an RNA backbone at microscopic and mesoscopic scale respectively, illustrating seven dihedral angles of $i$-$th$ suite at microscopic scale and six centers $(k=2)$ of the sugar rings (from $s_{i-2}$ to $s_{i+3}$) defining shape of $i$-$th$ suite at mesoscopic scale.
  • Figure 6: Framework of PSM-DBSCASN-MC for RNA correction at microscopic and mesoscopic scale.
  • ...and 9 more figures

Theorems & Definitions (9)

  • Remark 2.1
  • Remark 2.2
  • Definition D.1
  • Definition D.2
  • Definition D.3
  • Definition D.4
  • Definition D.5
  • Definition D.6
  • Definition D.7