ADRS-CNet: An adaptive dimensionality reduction selection and classification network for DNA storage clustering algorithms

Bowen Liu; Jiankun Li

ADRS-CNet: An adaptive dimensionality reduction selection and classification network for DNA storage clustering algorithms

Bowen Liu, Jiankun Li

TL;DR

Training a multilayer perceptron model to classify input DNA sequence features and adaptively select the most suitable dimensionality reduction method to enhance subsequent clustering results demonstrates that this approach effectively mitigates the impact of the curse of dimensionality on clustering models.

Abstract

DNA storage technology offers new possibilities for addressing massive data storage due to its high storage density, long-term preservation, low maintenance cost, and compact size. To improve the reliability of stored information, base errors and missing storage sequences are challenges that must be faced. Currently, clustering and comparison of sequenced sequences are employed to recover the original sequence information as much as possible. Nonetheless, extracting DNA sequences of different lengths as features leads to the curse of dimensionality, which needs to be overcome. To address this, techniques like PCA, UMAP, and t-SNE are commonly employed to project high-dimensional features into low-dimensional space. Considering that these methods exhibit varying effectiveness in dimensionality reduction when dealing with different datasets, this paper proposes training a multilayer perceptron model to classify input DNA sequence features and adaptively select the most suitable dimensionality reduction method to enhance subsequent clustering results. Through testing on open-source datasets and comparing our approach with various baseline methods, experimental results demonstrate that our model exhibits superior classification performance and significantly improves clustering outcomes. This displays that our approach effectively mitigates the impact of the curse of dimensionality on clustering models.

ADRS-CNet: An adaptive dimensionality reduction selection and classification network for DNA storage clustering algorithms

TL;DR

Abstract

Paper Structure (24 sections, 11 equations, 8 figures, 6 tables)

This paper contains 24 sections, 11 equations, 8 figures, 6 tables.

Introduction
Related Works
Error Correction Coding
Clustered Error Correction Coding
Alignment-Based Clustering Methods
K-mer Counting-Based Clustering Methods
Methodology
Feature Engineering
K-means Clustering Algorithm
UMAP
PCA
t-SNE
MLP
Method Discussion
ADRS-CNet
...and 9 more sections

Figures (8)

Figure 1: Major Processes of DNA Storage
Figure 2: The framework for ADRS-CNet
Figure 3: 100 to 199 Clustering accuracy with different dimensions
Figure 4: 100 to 199 Clustering accuracy with categorical axis
Figure 5: 9800 to 9899 clustering accuracy with different dimensions
...and 3 more figures

ADRS-CNet: An adaptive dimensionality reduction selection and classification network for DNA storage clustering algorithms

TL;DR

Abstract

ADRS-CNet: An adaptive dimensionality reduction selection and classification network for DNA storage clustering algorithms

Authors

TL;DR

Abstract

Table of Contents

Figures (8)