Table of Contents
Fetching ...

TQml Simulator: optimized simulation of quantum machine learning

Viacheslav Kuzmin, Basil Kyriacou, Tatjana Protasevich, Mateusz Papierz, Mo Kordzanganeh, Alexey Melnikov

TL;DR

This work tackles the computational bottleneck of simulating quantum machine learning circuits using state vectors by exploiting gate-specific information to optimize layer-by-layer simulations. It benchmarks a suite of methods—including Unitary, Einsum, Permutation, Diagonal, and H-Rz expansions—and demonstrates that the optimal technique for a given gate layer depends on the number of qubits. Building on these insights, the authors introduce the TQml Simulator, which selects the most efficient per-layer method and reports up to an order-of-magnitude speedup over the PennyLane default.qubit simulator across diverse circuits and hardware, with additional evaluation of a JAX back-end. The work also provides hardware-specific benchmarks for IBM and IonQ native gates and discusses memory behavior, paving the way for scalable, hardware-adaptive QML simulations and future integration with tensor-network approaches. These results have practical impact by enabling faster training and inference of QML models on CPUs and accelerators, and by offering a framework that can adapt to emerging quantum hardware and back-ends.

Abstract

Hardware-efficient circuits employed in Quantum Machine Learning are typically composed of alternating layers of uniformly applied gates. High-speed numerical simulators for such circuits are crucial for advancing research in this field. In this work, we numerically benchmark universal and gate-specific techniques for simulating the action of layers of gates on quantum state vectors, aiming to accelerate the overall simulation of Quantum Machine Learning algorithms. Our analysis shows that the optimal simulation method for a given layer of gates depends on the number of qubits involved, and that a tailored combination of techniques can yield substantial performance gains in the forward and backward passes for a given circuit. Building on these insights, we developed a numerical simulator, named TQml Simulator, that employs the most efficient simulation method for each layer in a given circuit. We evaluated TQml Simulator on circuits constructed from standard gate sets, such as rotations and CNOTs, as well as on native gates from IonQ and IBM quantum processing units. In most cases, our simulator outperforms equivalent Pennylane's default.qubit simulator by up to a factor of 10, depending on the circuit, the number of qubits, the batch size of the input data, and the hardware used.

TQml Simulator: optimized simulation of quantum machine learning

TL;DR

This work tackles the computational bottleneck of simulating quantum machine learning circuits using state vectors by exploiting gate-specific information to optimize layer-by-layer simulations. It benchmarks a suite of methods—including Unitary, Einsum, Permutation, Diagonal, and H-Rz expansions—and demonstrates that the optimal technique for a given gate layer depends on the number of qubits. Building on these insights, the authors introduce the TQml Simulator, which selects the most efficient per-layer method and reports up to an order-of-magnitude speedup over the PennyLane default.qubit simulator across diverse circuits and hardware, with additional evaluation of a JAX back-end. The work also provides hardware-specific benchmarks for IBM and IonQ native gates and discusses memory behavior, paving the way for scalable, hardware-adaptive QML simulations and future integration with tensor-network approaches. These results have practical impact by enabling faster training and inference of QML models on CPUs and accelerators, and by offering a framework that can adapt to emerging quantum hardware and back-ends.

Abstract

Hardware-efficient circuits employed in Quantum Machine Learning are typically composed of alternating layers of uniformly applied gates. High-speed numerical simulators for such circuits are crucial for advancing research in this field. In this work, we numerically benchmark universal and gate-specific techniques for simulating the action of layers of gates on quantum state vectors, aiming to accelerate the overall simulation of Quantum Machine Learning algorithms. Our analysis shows that the optimal simulation method for a given layer of gates depends on the number of qubits involved, and that a tailored combination of techniques can yield substantial performance gains in the forward and backward passes for a given circuit. Building on these insights, we developed a numerical simulator, named TQml Simulator, that employs the most efficient simulation method for each layer in a given circuit. We evaluated TQml Simulator on circuits constructed from standard gate sets, such as rotations and CNOTs, as well as on native gates from IonQ and IBM quantum processing units. In most cases, our simulator outperforms equivalent Pennylane's default.qubit simulator by up to a factor of 10, depending on the circuit, the number of qubits, the batch size of the input data, and the hardware used.

Paper Structure

This paper contains 17 sections, 20 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Tensor diagrams illustrating two approaches for applying a layer of single-qubit gates, $(\prod_i U^{(i)})|\psi\rangle$ (where the superscript denotes the qubit index). (a) The explicit unitary operation is represented as $(\bigotimes_{i=1}^4 U^{(i)}) \times \psi$, while (b) the Einstein summation approach is depicted by $U^{i'_1}_{i_1} U^{i'_2}_{i_2} U^{i'_3}_{i_3} U^{i'_4}_{i_4} \, \psi_{i_1 i_2 i_3 i_4}$. In these diagrams, boxes denote tensors and wires represent their indices; connected wires indicate summation over the corresponding indices.
  • Figure 2: Forward pass time on a single CPU thread for layers composed of unparametrized (a) H, (b) Sx, and (c) native IBM ECR gates and (d) parametrized MS gate as a function of the number of qubits. Results are obtained using the PennyLane default.qubit simulator (PL) and the methods evaluated in this work: Einsum, Unitary operation (Uni.), Unitary real operation (Uni. real), and Fast Hadamard-Walsh Transform (FHWT).
  • Figure 3: Forward pass time on a single CPU thread for layers composed of unparametrized permutation gates (a) X and (b) CNOT as a function of the number of qubits. Results are obtained using the PennyLane default.qubit simulator (PL) and the methods evaluated in this work: Einsum, Unitary operation (Uni.), Unitary real operation (Uni. real), and Permutation (Perm.).
  • Figure 4: Forward pass time on a single CPU thread for layers composed of parametrized diagonal (a) Rz, (c) Rzz, and (b) antidiagonal native IonQ GPI gates versus the number of qubits. Results are obtained using the PennyLane default.qubit simulator (PL) and the techniques evaluated in this work: Einsum, Unitary operation (Uni.), Diagonal Einsum (Diag. Einsum), Eigenphase Computation (Diag. EC), and Diagonal Tensor Product (Diag. TP).
  • Figure 5: Forward pass time on a single CPU thread for layers composed of parametrized rotation gates (a) Ry, (b) Rx, (c) general Rot, and (d) native IonQ GPI2 gates as a function of the number of qubits. Results are obtained using the PennyLane default.qubit simulator (PL) and the methods evaluated in this work: Einsum, Unitary operation (Uni.), Unitary real operation (Uni. real), and H-Rz Expansion (H-Rz Exp.).
  • ...and 8 more figures