Optimizing Quantum Data Embeddings for Ligand-Based Virtual Screening
Junggu Choi, Tak Hur, Seokhoon Jeong, Kyle L. Jung, Jun Bae Park, Junho Lee, Jae U. Jung, Daniel K. Park
TL;DR
This study tackles ligand-based virtual screening under data scarcity by developing and evaluating quantum data embeddings, including Neural Quantum Embedding (NQE) with classically intractable ZZ and XYZ feature maps, as well as hybrid quantum–classical schemes. It demonstrates that quantum embeddings, particularly when paired with a QCNN or transfer-learning hybrids, often outperform classical RBF-based embeddings, with pronounced gains in limited-data and imbalanced settings. The work also shows that reusing quantum-trained representations in classical models (quantum-pretrained embeddings) can yield additional performance benefits. Collectively, these results highlight the potential of quantum-inspired embeddings to enhance LBVS and inform practical, data-efficient screening pipelines, while pointing to future hardware-validation and interpretable embedding design directions.
Abstract
Effective molecular representations are essential for ligand-based virtual screening. We investigate how quantum data embedding strategies can improve this task by developing and evaluating a family of quantum-classical hybrid embedding approaches. These approaches combine classical neural networks with parameterized quantum circuits in different ways to generate expressive molecular representations and are assessed across two benchmark datasets of different sizes: the LIT-PCBA and COVID-19 collections. Across multiple biological targets and class-imbalance settings, several quantum and hybrid embedding variants consistently outperform classical baselines, especially in limited-data regimes. These results highlight the potential of optimized quantum data embeddings as data-efficient tools for ligand-based virtual screening.
