Table of Contents
Fetching ...

A quantum k-nearest neighbors algorithm based on the Euclidean distance estimation

Enrico Zardini, Enrico Blanzieri, Davide Pastorello

TL;DR

This work introduces a quantum k-NN variant that uses the Euclidean distance by employing two amplitude-encoded data representations (extension and translation) and a simple, oracle-free Bell-H circuit to estimate distances in parallel. It provides a formal algorithm, analyzes complexity under QRAM assumptions, and implements the approach in Python with Qiskit across statevector, simulation, and classical modalities. Empirical results show that, in the ideal statevector setting, the quantum method can match or surpass classical baselines, while practical performance on noisy or limited-shot simulations depends on the encoding and number of shots, with translation+avg and extension+avg offering complementary advantages. The work advances practical quantum distance estimation for k-NN and highlights the conditions under which quantum speedups may be realized, guiding future experiments on larger datasets and real hardware.

Abstract

The k-nearest neighbors (k-NN) is a basic machine learning (ML) algorithm, and several quantum versions of it, employing different distance metrics, have been presented in the last few years. Although the Euclidean distance is one of the most widely used distance metrics in ML, it has not received much consideration in the development of these quantum variants. In this article, a novel quantum k-NN algorithm based on the Euclidean distance is introduced. Specifically, the algorithm is characterised by a quantum encoding requiring a low number of qubits and a simple quantum circuit not involving oracles, aspects that favor its realization. In addition to the mathematical formulation and some complexity observations, a detailed empirical evaluation with simulations is presented. In particular, the results have shown the correctness of the formulation, a drop in the performance of the algorithm when the number of measurements is limited, the competitiveness with respect to some classical baseline methods in the ideal case, and the possibility of improving the performance by increasing the number of measurements.

A quantum k-nearest neighbors algorithm based on the Euclidean distance estimation

TL;DR

This work introduces a quantum k-NN variant that uses the Euclidean distance by employing two amplitude-encoded data representations (extension and translation) and a simple, oracle-free Bell-H circuit to estimate distances in parallel. It provides a formal algorithm, analyzes complexity under QRAM assumptions, and implements the approach in Python with Qiskit across statevector, simulation, and classical modalities. Empirical results show that, in the ideal statevector setting, the quantum method can match or surpass classical baselines, while practical performance on noisy or limited-shot simulations depends on the encoding and number of shots, with translation+avg and extension+avg offering complementary advantages. The work advances practical quantum distance estimation for k-NN and highlights the conditions under which quantum speedups may be realized, guiding future experiments on larger datasets and real hardware.

Abstract

The k-nearest neighbors (k-NN) is a basic machine learning (ML) algorithm, and several quantum versions of it, employing different distance metrics, have been presented in the last few years. Although the Euclidean distance is one of the most widely used distance metrics in ML, it has not received much consideration in the development of these quantum variants. In this article, a novel quantum k-NN algorithm based on the Euclidean distance is introduced. Specifically, the algorithm is characterised by a quantum encoding requiring a low number of qubits and a simple quantum circuit not involving oracles, aspects that favor its realization. In addition to the mathematical formulation and some complexity observations, a detailed empirical evaluation with simulations is presented. In particular, the results have shown the correctness of the formulation, a drop in the performance of the algorithm when the number of measurements is limited, the competitiveness with respect to some classical baseline methods in the ideal case, and the possibility of improving the performance by increasing the number of measurements.
Paper Structure (35 sections, 25 equations, 12 figures, 15 tables)

This paper contains 35 sections, 25 equations, 12 figures, 15 tables.

Figures (12)

  • Figure 1: Example of quantum circuit for the quantum $k$-NN based on the Euclidean distance. In detail, $N=4$, $d=2$, and the execution modality is simulation (statevector does not include the final measurements)
  • Figure 2: Comparison between classical and statevector execution modalities in terms of accuracy (a), Jaccard index (b), and Average Jaccard score (c). The configuration used for statevector is (extension, avg), but the results are the same for all configurations. Each point is related to a dataset fold
  • Figure 3: Comparison between statevector (extension, avg) and simulation (extension, avg) in terms of accuracy (a), Jaccard index (b), and Average Jaccard score (c). The number of shots for simulation is 1024, and each point is related to a dataset fold
  • Figure 4: Comparison of (encoding, distance estimate) configurations in terms of accuracy (a) and Jaccard index (b) for the simulation execution modality. The number of shots is 1024, and each data point corresponds to the difference for a (dataset fold, k value) pair
  • Figure 5: Comparison between some classical baseline methods and statevector in terms of accuracy. The configuration used for statevector is (translation, avg), but the results are the same for all configurations. Each point is related to a dataset fold
  • ...and 7 more figures