Fusion of classical and quantum kernels enables accurate and robust two-sample tests
Yu Terada, Yugo Ogio, Ken Arai, Hiroyuki Tezuka, Yu Tanaka
TL;DR
The paper tackles the challenge of performing reliable two-sample tests with limited data by refining kernel-based testing via the MMD-FUSE framework. It introduces quantum kernels and a hybrid quantum-classical kernel pool to expand the expressive hypothesis space, and demonstrates improved test power, particularly in small-sample and high-dimensional settings. Empirical results on synthetic and real-world biomedical datasets show that quantum kernels can outperform classical choices when properly tuned, and that hybridization offers robust performance across diverse data types. The work suggests that data-adaptive kernel design, including principled weighting of kernel types, can yield practical gains for kernel-based hypothesis testing in constrained data regimes.
Abstract
Two-sample tests have been extensively employed in various scientific fields and machine learning such as evaluation on the effectiveness of drugs and A/B testing on different marketing strategies to discriminate whether two sets of samples come from the same distribution or not. Kernel-based procedures for hypothetical testing have been proposed to efficiently disentangle high-dimensional complex structures in data to obtain accurate results in a model-free way by embedding the data into the reproducing kernel Hilbert space (RKHS). While the choice of kernels plays a crucial role for their performance, little is understood about how to choose kernel especially for small datasets. Here we aim to construct a hypothetical test which is effective even for small datasets, based on the theoretical foundation of kernel-based tests using maximum mean discrepancy, which is called MMD-FUSE. To address this, we enhance the MMD-FUSE framework by incorporating quantum kernels and propose a novel hybrid testing strategy that fuses classical and quantum kernels. This approach creates a powerful and adaptive test by combining the domain-specific inductive biases of classical kernels with the unique expressive power of quantum kernels. We evaluate our method on various synthetic and real-world clinical datasets, and our experiments reveal two key findings: 1) With appropriate hyperparameter tuning, MMD-FUSE with quantum kernels consistently improves test power over classical counterparts, especially for small and high-dimensional data. 2) The proposed hybrid framework demonstrates remarkable robustness, adapting to different data characteristics and achieving high test power across diverse scenarios. These results highlight the potential of quantum-inspired and hybrid kernel strategies to build more effective statistical tests, offering a versatile tool for data analysis where sample sizes are limited.
