Table of Contents
Fetching ...

Modeling Quantum Machine Learning for Genomic Data Analysis

Navneet Singh, Shiva Raj Pokhrel

TL;DR

The paper investigates the applicability of quantum machine learning to binary genomic sequence classification by evaluating multiple QML models (QSVC, Pegasos-QSVM, VQC, QNN) under different feature maps (ZFeatureMap, ZZFeatureMap, PauliFeatureMap) using an open-source Qiskit-based workflow. By reducing genomic data dimensionality with PCA to four components and encoding via distinct feature maps, the study reveals a strong dependence of classifier performance on both the feature map and the algorithm, with Pegasos-QSVM achieving high recall and QNN delivering the best training accuracy but potential overfitting risk. The convergence analyses show QSVM’s convex dual structure and Pegasos’ favorable stochastic optimization, while VQC and QNN introduce considerations of expressiveness, gradient-based trainability, and barren plateaus in quantum parameter landscapes. Overall, the work demonstrates the potential of QML for genomic data classification on NISQ-like devices, emphasizes the critical role of feature-map design, and outlines directions for improving robustness, generalization, and noise resilience for practical genomics applications.

Abstract

Quantum Machine Learning (QML) continues to evolve, unlocking new opportunities for diverse applications. In this study, we investigate and evaluate the applicability of QML models for binary classification of genome sequence data by employing various feature mapping techniques. We present an open-source, independent Qiskit-based implementation to conduct experiments on a benchmark genomic dataset. Our simulations reveal that the interplay between feature mapping techniques and QML algorithms significantly influences performance. Notably, the Pegasos Quantum Support Vector Classifier (Pegasos-QSVC) exhibits high sensitivity, particularly excelling in recall metrics, while Quantum Neural Networks (QNN) achieve the highest training accuracy across all feature maps. However, the pronounced variability in classifier performance, dependent on feature mapping, highlights the risk of overfitting to localized output distributions in certain scenarios. This work underscores the transformative potential of QML for genomic data classification while emphasizing the need for continued advancements to enhance the robustness and accuracy of these methodologies.

Modeling Quantum Machine Learning for Genomic Data Analysis

TL;DR

The paper investigates the applicability of quantum machine learning to binary genomic sequence classification by evaluating multiple QML models (QSVC, Pegasos-QSVM, VQC, QNN) under different feature maps (ZFeatureMap, ZZFeatureMap, PauliFeatureMap) using an open-source Qiskit-based workflow. By reducing genomic data dimensionality with PCA to four components and encoding via distinct feature maps, the study reveals a strong dependence of classifier performance on both the feature map and the algorithm, with Pegasos-QSVM achieving high recall and QNN delivering the best training accuracy but potential overfitting risk. The convergence analyses show QSVM’s convex dual structure and Pegasos’ favorable stochastic optimization, while VQC and QNN introduce considerations of expressiveness, gradient-based trainability, and barren plateaus in quantum parameter landscapes. Overall, the work demonstrates the potential of QML for genomic data classification on NISQ-like devices, emphasizes the critical role of feature-map design, and outlines directions for improving robustness, generalization, and noise resilience for practical genomics applications.

Abstract

Quantum Machine Learning (QML) continues to evolve, unlocking new opportunities for diverse applications. In this study, we investigate and evaluate the applicability of QML models for binary classification of genome sequence data by employing various feature mapping techniques. We present an open-source, independent Qiskit-based implementation to conduct experiments on a benchmark genomic dataset. Our simulations reveal that the interplay between feature mapping techniques and QML algorithms significantly influences performance. Notably, the Pegasos Quantum Support Vector Classifier (Pegasos-QSVC) exhibits high sensitivity, particularly excelling in recall metrics, while Quantum Neural Networks (QNN) achieve the highest training accuracy across all feature maps. However, the pronounced variability in classifier performance, dependent on feature mapping, highlights the risk of overfitting to localized output distributions in certain scenarios. This work underscores the transformative potential of QML for genomic data classification while emphasizing the need for continued advancements to enhance the robustness and accuracy of these methodologies.
Paper Structure (16 sections, 59 equations, 8 figures, 1 table, 4 algorithms)

This paper contains 16 sections, 59 equations, 8 figures, 1 table, 4 algorithms.

Figures (8)

  • Figure 1: Illustration of the proposed workflow in this paper. We outline a method for applying QML techniques to classical Genomic datasets using NISQ devices: a) Dataset Split: Divide the classical dataset into training and test sets. b) Dimensionality Reduction: Reduce the dataset to four dimensions using Principal Component Analysis (PCA) due to NISQ device limitations. c) Quantum Encoding: Encode the dataset into quantum data using ZFeatureMap, ZZFeatureMap, and PauliFeatureMap for Hilbert space representation. d QML Training: Train various QML algorithms on the quantum data. e) Performance Metrics: Evaluate true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) to calculate accuracy, precision, recall, and F1 score. f) Classification: Use the trained model to classify test sequences.
  • Figure 2: Quantum circuit of ZFeature Map.
  • Figure 3: Quantum circuit of ZZFeature Map.
  • Figure 4: Quantum circuit of PauliFeature Map.
  • Figure 5: Quantum circuit for QSVC.
  • ...and 3 more figures