Des-q: a quantum algorithm to provably speedup retraining of decision trees
Niraj Kumar, Romina Yalovetzky, Changhao Li, Pierre Minssen, Marco Pistoia
TL;DR
This work introduces Des-q, a quantum algorithm for constructing and retraining decision trees with regression and binary classification capabilities. It leverages KP-tree based quantum-accessible data structures, amplitude-encoded states, and a supervised quantum clustering (q-means) framework to realize piecewise linear, multi-hyperplane splits guided by quantum-estimated feature weights from Pearson/point-biserial correlations. The key contributions include a poly-logarithmic-in-$N$ retraining complexity after an initial load, explicit steps for data loading, weight estimation, clustering, and leaf-label extraction, and numerical evidence on real datasets showing competitive accuracy with a clear potential speedup for periodic updates. The results suggest Des-q can maintain performance while dramatically accelerating retraining in big-data contexts, albeit with hardware and data-loading assumptions that warrant further exploration and potential dequantized variants for practicality.
Abstract
Decision trees are widely adopted machine learning models due to their simplicity and explainability. However, as training data size grows, standard methods become increasingly slow, scaling polynomially with the number of training examples. In this work, we introduce Des-q, a novel quantum algorithm to construct and retrain decision trees for regression and binary classification tasks. Assuming the data stream produces small, periodic increments of new training examples, Des-q significantly reduces the tree retraining time. Des-q achieves a logarithmic complexity in the combined total number of old and new examples, even accounting for the time needed to load the new samples into quantum-accessible memory. Our approach to grow the tree from any given node involves performing piecewise linear splits to generate multiple hyperplanes, thus partitioning the input feature space into distinct regions. To determine the suitable anchor points for these splits, we develop an efficient quantum-supervised clustering method, building upon the q-means algorithm introduced by Kerenidis et al. We benchmark the simulated version of Des-q against the state-of-the-art classical methods on multiple data sets and observe that our algorithm exhibits similar performance to the state-of-the-art decision trees while significantly speeding up the periodic tree retraining.
