Table of Contents
Fetching ...

Application of Quantum Tensor Networks for Protein Classification

Debarshi Kundu, Archisman Ghosh, Srinivasan Ekambaram, Jian Wang, Nikolay Dokholyan, Swaroop Ghosh

TL;DR

This work treats protein sequences as sentences analyzed with Quantum Natural Language Processing to classify subcellular localization via Quantum Tensor Networks. It introduces two quantum architectures, PTN and CTN, with hierarchical and uniform parameter-sharing variants, alongside a classical ESM2-based baseline for comparison. The best quantum model, hPTN, reaches $0.94$ accuracy, close to the classical $0.98$ benchmark while using only about $800$ parameters versus $8\times 10^{6}$ for the ESM2 model, illustrating potential efficiency gains. However, simulations are noise-free, signaling the need to incorporate quantum noise in future work to assess real-world hardware viability and scalability.

Abstract

We show that protein sequences can be thought of as sentences in natural language processing and can be parsed using the existing Quantum Natural Language framework into parameterized quantum circuits of reasonable qubits, which can be trained to solve various protein-related machine-learning problems. We classify proteins based on their subcellular locations, a pivotal task in bioinformatics that is key to understanding biological processes and disease mechanisms. Leveraging the quantum-enhanced processing capabilities, we demonstrate that Quantum Tensor Networks (QTN) can effectively handle the complexity and diversity of protein sequences. We present a detailed methodology that adapts QTN architectures to the nuanced requirements of protein data, supported by comprehensive experimental results. We demonstrate two distinct QTNs, inspired by classical recurrent neural networks (RNN) and convolutional neural networks (CNN), to solve the binary classification task mentioned above. Our top-performing quantum model has achieved a 94% accuracy rate, which is comparable to the performance of a classical model that uses the ESM2 protein language model embeddings. It's noteworthy that the ESM2 model is extremely large, containing 8 million parameters in its smallest configuration, whereas our best quantum model requires only around 800 parameters. We demonstrate that these hybrid models exhibit promising performance, showcasing their potential to compete with classical models of similar complexity.

Application of Quantum Tensor Networks for Protein Classification

TL;DR

This work treats protein sequences as sentences analyzed with Quantum Natural Language Processing to classify subcellular localization via Quantum Tensor Networks. It introduces two quantum architectures, PTN and CTN, with hierarchical and uniform parameter-sharing variants, alongside a classical ESM2-based baseline for comparison. The best quantum model, hPTN, reaches accuracy, close to the classical benchmark while using only about parameters versus for the ESM2 model, illustrating potential efficiency gains. However, simulations are noise-free, signaling the need to incorporate quantum noise in future work to assess real-world hardware viability and scalability.

Abstract

We show that protein sequences can be thought of as sentences in natural language processing and can be parsed using the existing Quantum Natural Language framework into parameterized quantum circuits of reasonable qubits, which can be trained to solve various protein-related machine-learning problems. We classify proteins based on their subcellular locations, a pivotal task in bioinformatics that is key to understanding biological processes and disease mechanisms. Leveraging the quantum-enhanced processing capabilities, we demonstrate that Quantum Tensor Networks (QTN) can effectively handle the complexity and diversity of protein sequences. We present a detailed methodology that adapts QTN architectures to the nuanced requirements of protein data, supported by comprehensive experimental results. We demonstrate two distinct QTNs, inspired by classical recurrent neural networks (RNN) and convolutional neural networks (CNN), to solve the binary classification task mentioned above. Our top-performing quantum model has achieved a 94% accuracy rate, which is comparable to the performance of a classical model that uses the ESM2 protein language model embeddings. It's noteworthy that the ESM2 model is extremely large, containing 8 million parameters in its smallest configuration, whereas our best quantum model requires only around 800 parameters. We demonstrate that these hybrid models exhibit promising performance, showcasing their potential to compete with classical models of similar complexity.
Paper Structure (10 sections, 8 figures, 2 tables)

This paper contains 10 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: A diagram describing the flow of binary classification of the protein sequence.
  • Figure 2: Assignment of parameterized quantum circuits $U(\phi_i)$ to boxes labeled $i$: This is an example of a function definition, where the $words$ are mapped to qubits (bits in case of blue wires) via the unitary matrices $U(\phi)$. The $\perp$ is either represented as an all-zeroes state, postselect, or discard.
  • Figure 3: A sentence is broken into $words$$m_1...m_n$, converted to corresponding unitaries based on the Functor rules in Fig. \ref{['fig:schematic']} and finally run through the Scheme ($QTN$).
  • Figure 4: An example to parse an demonstrative protein sequence $AGSQ$ into a protein syntax tree based on CTN.
  • Figure 5: Convolutional tensor network (CTN)
  • ...and 3 more figures