Table of Contents
Fetching ...

Optimizing Quantum Data Embeddings for Ligand-Based Virtual Screening

Junggu Choi, Tak Hur, Seokhoon Jeong, Kyle L. Jung, Jun Bae Park, Junho Lee, Jae U. Jung, Daniel K. Park

TL;DR

This study tackles ligand-based virtual screening under data scarcity by developing and evaluating quantum data embeddings, including Neural Quantum Embedding (NQE) with classically intractable ZZ and XYZ feature maps, as well as hybrid quantum–classical schemes. It demonstrates that quantum embeddings, particularly when paired with a QCNN or transfer-learning hybrids, often outperform classical RBF-based embeddings, with pronounced gains in limited-data and imbalanced settings. The work also shows that reusing quantum-trained representations in classical models (quantum-pretrained embeddings) can yield additional performance benefits. Collectively, these results highlight the potential of quantum-inspired embeddings to enhance LBVS and inform practical, data-efficient screening pipelines, while pointing to future hardware-validation and interpretable embedding design directions.

Abstract

Effective molecular representations are essential for ligand-based virtual screening. We investigate how quantum data embedding strategies can improve this task by developing and evaluating a family of quantum-classical hybrid embedding approaches. These approaches combine classical neural networks with parameterized quantum circuits in different ways to generate expressive molecular representations and are assessed across two benchmark datasets of different sizes: the LIT-PCBA and COVID-19 collections. Across multiple biological targets and class-imbalance settings, several quantum and hybrid embedding variants consistently outperform classical baselines, especially in limited-data regimes. These results highlight the potential of optimized quantum data embeddings as data-efficient tools for ligand-based virtual screening.

Optimizing Quantum Data Embeddings for Ligand-Based Virtual Screening

TL;DR

This study tackles ligand-based virtual screening under data scarcity by developing and evaluating quantum data embeddings, including Neural Quantum Embedding (NQE) with classically intractable ZZ and XYZ feature maps, as well as hybrid quantum–classical schemes. It demonstrates that quantum embeddings, particularly when paired with a QCNN or transfer-learning hybrids, often outperform classical RBF-based embeddings, with pronounced gains in limited-data and imbalanced settings. The work also shows that reusing quantum-trained representations in classical models (quantum-pretrained embeddings) can yield additional performance benefits. Collectively, these results highlight the potential of quantum-inspired embeddings to enhance LBVS and inform practical, data-efficient screening pipelines, while pointing to future hardware-validation and interpretable embedding design directions.

Abstract

Effective molecular representations are essential for ligand-based virtual screening. We investigate how quantum data embedding strategies can improve this task by developing and evaluating a family of quantum-classical hybrid embedding approaches. These approaches combine classical neural networks with parameterized quantum circuits in different ways to generate expressive molecular representations and are assessed across two benchmark datasets of different sizes: the LIT-PCBA and COVID-19 collections. Across multiple biological targets and class-imbalance settings, several quantum and hybrid embedding variants consistently outperform classical baselines, especially in limited-data regimes. These results highlight the potential of optimized quantum data embeddings as data-efficient tools for ligand-based virtual screening.

Paper Structure

This paper contains 24 sections, 7 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: The overview of this research. $g(x,w)$ represent the classical neural network.
  • Figure 2: (a) Schematic representation of the 8-qubit QCNN integrated with neural quantum embedding (NQE). The purple section indicates the NQE, which transforms the classical data $x$ into the quantum state $|x\rangle$. The blue section represents the architecture of the 8-qubit quantum convolutional neural network (QCNN). (b) Structural diagram of SU(4) ansatz. $U_3(\theta_1,\theta_2,\theta_3)$ denotes an arbitrary single-qubit gate. It can be expressed by $U_3(\theta_1,\theta_2,\theta_3)=R_z(\theta_2)R_x(-\pi/2)R_z(\theta_1)R_x(\pi/2)R_z(\theta_3)$. $R_i(\theta)$ represents the rotation by $\theta$ around the $i$-axis of the Bloch sphere.
  • Figure 3: The diagram of the data embedding methods applied to the LIT-PCBA dataset.
  • Figure 4: Trace distance changes before and after training of NQE with the ZZ feature map. “Before” and “After” indicate the trace distances of NQE prior to and following training, respectively. “1:1” and “1:6” denote the class ratios of the dataset.
  • Figure 5: Trace distance changes before and after training of NQE with the XYZ feature map. “Before” and “After” indicate the trace distances of NQE prior to and following training, respectively. “1:1” and “1:6” denote the class ratios of the dataset.