A quantum k-nearest neighbors algorithm based on the Euclidean distance estimation
Enrico Zardini, Enrico Blanzieri, Davide Pastorello
TL;DR
This work introduces a quantum k-NN variant that uses the Euclidean distance by employing two amplitude-encoded data representations (extension and translation) and a simple, oracle-free Bell-H circuit to estimate distances in parallel. It provides a formal algorithm, analyzes complexity under QRAM assumptions, and implements the approach in Python with Qiskit across statevector, simulation, and classical modalities. Empirical results show that, in the ideal statevector setting, the quantum method can match or surpass classical baselines, while practical performance on noisy or limited-shot simulations depends on the encoding and number of shots, with translation+avg and extension+avg offering complementary advantages. The work advances practical quantum distance estimation for k-NN and highlights the conditions under which quantum speedups may be realized, guiding future experiments on larger datasets and real hardware.
Abstract
The k-nearest neighbors (k-NN) is a basic machine learning (ML) algorithm, and several quantum versions of it, employing different distance metrics, have been presented in the last few years. Although the Euclidean distance is one of the most widely used distance metrics in ML, it has not received much consideration in the development of these quantum variants. In this article, a novel quantum k-NN algorithm based on the Euclidean distance is introduced. Specifically, the algorithm is characterised by a quantum encoding requiring a low number of qubits and a simple quantum circuit not involving oracles, aspects that favor its realization. In addition to the mathematical formulation and some complexity observations, a detailed empirical evaluation with simulations is presented. In particular, the results have shown the correctness of the formulation, a drop in the performance of the algorithm when the number of measurements is limited, the competitiveness with respect to some classical baseline methods in the ideal case, and the possibility of improving the performance by increasing the number of measurements.
