Table of Contents
Fetching ...

Compare Similarities Between DNA Sequences Using Permutation-Invariant Quantum Kernel

Chenyu Shi, Gabriele Leoni, Mauro Petrillo, Antonio Puertas Gallardo, Hao Wang

TL;DR

This work tackles DNA sequence similarity under the NP-Complete EDM metric by proposing a permutation-invariant variational quantum kernel that uses SIC-POVM encoding to map the four nucleotides to mutually equidistant quantum states. The kernel enforces EDM-like symmetry via a permutation-invariant parameterized circuit and enhances expressiveness with data re-uploading, enabling polynomial-time inference on short sequences. In simulations, the method achieves higher order accuracy than classical deep kernel baselines while using orders of magnitude fewer trainable parameters, highlighting the potential of symmetry-informed quantum kernels for bioscience tasks such as AMR gene detection. Limitations include scalability to longer sequences on NISQ hardware and the need for more expressive architectures; future work could extend to direct permutation-invariant kernels and broader downstream applications.

Abstract

Computing the similarity between two DNA sequences is of vital importance in bioscience, yet it can be computationally expensive on classical hardware. For example, the edit distance with move operations (EDM), a DNA similarity measure of interest in biology, is proven to be NP-Complete to compute exactly on classical hardware. Recently, applied quantum algorithms have been anticipated to offer potential advantages over classical approaches. In this paper, we propose a novel variational quantum kernel model served as a surrogate model for estimating similarity between DNA sequences defined by EDM. Since the EDM metric exhibits a pairwise permutation-insensitive property, we incorporate a permutation-invariant structure into the variational quantum kernel to approximate this symmetry. Furthermore, to encode the four nucleotide bases as quantum states, we introduce a theoretically motivated encoding scheme based on symmetric informationally complete positive operator-valued measure (SIC-POVM) states. This encoding ensures mutual equivalence among bases, as each pair of symbols is mapped to quantum states that are equidistant on the Bloch sphere. We experimentally show that, equipped with the permutation-invariant circuit design and mutual-equivalence encoding, the proposed quantum kernel model achieves strong performance in approximating the similarity defined by EDM. Compared with classical kernel learning methods, our quantum approach achieves significantly higher accuracy while using substantially fewer trainable parameters.

Compare Similarities Between DNA Sequences Using Permutation-Invariant Quantum Kernel

TL;DR

This work tackles DNA sequence similarity under the NP-Complete EDM metric by proposing a permutation-invariant variational quantum kernel that uses SIC-POVM encoding to map the four nucleotides to mutually equidistant quantum states. The kernel enforces EDM-like symmetry via a permutation-invariant parameterized circuit and enhances expressiveness with data re-uploading, enabling polynomial-time inference on short sequences. In simulations, the method achieves higher order accuracy than classical deep kernel baselines while using orders of magnitude fewer trainable parameters, highlighting the potential of symmetry-informed quantum kernels for bioscience tasks such as AMR gene detection. Limitations include scalability to longer sequences on NISQ hardware and the need for more expressive architectures; future work could extend to direct permutation-invariant kernels and broader downstream applications.

Abstract

Computing the similarity between two DNA sequences is of vital importance in bioscience, yet it can be computationally expensive on classical hardware. For example, the edit distance with move operations (EDM), a DNA similarity measure of interest in biology, is proven to be NP-Complete to compute exactly on classical hardware. Recently, applied quantum algorithms have been anticipated to offer potential advantages over classical approaches. In this paper, we propose a novel variational quantum kernel model served as a surrogate model for estimating similarity between DNA sequences defined by EDM. Since the EDM metric exhibits a pairwise permutation-insensitive property, we incorporate a permutation-invariant structure into the variational quantum kernel to approximate this symmetry. Furthermore, to encode the four nucleotide bases as quantum states, we introduce a theoretically motivated encoding scheme based on symmetric informationally complete positive operator-valued measure (SIC-POVM) states. This encoding ensures mutual equivalence among bases, as each pair of symbols is mapped to quantum states that are equidistant on the Bloch sphere. We experimentally show that, equipped with the permutation-invariant circuit design and mutual-equivalence encoding, the proposed quantum kernel model achieves strong performance in approximating the similarity defined by EDM. Compared with classical kernel learning methods, our quantum approach achieves significantly higher accuracy while using substantially fewer trainable parameters.

Paper Structure

This paper contains 16 sections, 8 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: The working paradigm of variational quantum computing. The quantum computer applies a variational quantum circuit to process information or construct models for a certain task. The measurement results will be fed to a classical computer for optimization. The optimizer on the classical computer will update the parameters in the quantum circuit. After multiple loops, the parameters in the quantum circuit will be adjusted to be suitable for solving the target task.
  • Figure 1: In the encoding layer, each nucleotide base is assigned a unique quantum state on its corresponding qubit. These four quantum states are known as the SIC-POVM states.
  • Figure 2: The model sketch of the variational quantum kernel. The model sketch of the variational quantum kernel consists of a parameterized layer $U(\theta)$, an encoding layer $V(\cdot)$, and their corresponding conjugate transpose layers, arranged sequentially. The initial state is set to be $\ket{0}$. The output of the model is the probability of measuring $0$ in the computational basis, which is the Frobenius inner product between two density matrices. Thus, the entire model constructs a kernel function to metric similarity using a variational quantum circuit.
  • Figure 3: The four SIC-POVM states form a regular tetrahedron on the Bloch sphere to capture the mutual equality for encoding.
  • Figure 4: The encoding layer for the length-4 DNA sequence "ATGC". For Adenine, there is no quantum gate because only an identity operator is needed. For the other bases, a rotation gate $R_y$ and a phase gate $P$ are applied with corresponding angles.
  • ...and 4 more figures