Table of Contents
Fetching ...

Des-q: a quantum algorithm to provably speedup retraining of decision trees

Niraj Kumar, Romina Yalovetzky, Changhao Li, Pierre Minssen, Marco Pistoia

TL;DR

This work introduces Des-q, a quantum algorithm for constructing and retraining decision trees with regression and binary classification capabilities. It leverages KP-tree based quantum-accessible data structures, amplitude-encoded states, and a supervised quantum clustering (q-means) framework to realize piecewise linear, multi-hyperplane splits guided by quantum-estimated feature weights from Pearson/point-biserial correlations. The key contributions include a poly-logarithmic-in-$N$ retraining complexity after an initial load, explicit steps for data loading, weight estimation, clustering, and leaf-label extraction, and numerical evidence on real datasets showing competitive accuracy with a clear potential speedup for periodic updates. The results suggest Des-q can maintain performance while dramatically accelerating retraining in big-data contexts, albeit with hardware and data-loading assumptions that warrant further exploration and potential dequantized variants for practicality.

Abstract

Decision trees are widely adopted machine learning models due to their simplicity and explainability. However, as training data size grows, standard methods become increasingly slow, scaling polynomially with the number of training examples. In this work, we introduce Des-q, a novel quantum algorithm to construct and retrain decision trees for regression and binary classification tasks. Assuming the data stream produces small, periodic increments of new training examples, Des-q significantly reduces the tree retraining time. Des-q achieves a logarithmic complexity in the combined total number of old and new examples, even accounting for the time needed to load the new samples into quantum-accessible memory. Our approach to grow the tree from any given node involves performing piecewise linear splits to generate multiple hyperplanes, thus partitioning the input feature space into distinct regions. To determine the suitable anchor points for these splits, we develop an efficient quantum-supervised clustering method, building upon the q-means algorithm introduced by Kerenidis et al. We benchmark the simulated version of Des-q against the state-of-the-art classical methods on multiple data sets and observe that our algorithm exhibits similar performance to the state-of-the-art decision trees while significantly speeding up the periodic tree retraining.

Des-q: a quantum algorithm to provably speedup retraining of decision trees

TL;DR

This work introduces Des-q, a quantum algorithm for constructing and retraining decision trees with regression and binary classification capabilities. It leverages KP-tree based quantum-accessible data structures, amplitude-encoded states, and a supervised quantum clustering (q-means) framework to realize piecewise linear, multi-hyperplane splits guided by quantum-estimated feature weights from Pearson/point-biserial correlations. The key contributions include a poly-logarithmic-in- retraining complexity after an initial load, explicit steps for data loading, weight estimation, clustering, and leaf-label extraction, and numerical evidence on real datasets showing competitive accuracy with a clear potential speedup for periodic updates. The results suggest Des-q can maintain performance while dramatically accelerating retraining in big-data contexts, albeit with hardware and data-loading assumptions that warrant further exploration and potential dequantized variants for practicality.

Abstract

Decision trees are widely adopted machine learning models due to their simplicity and explainability. However, as training data size grows, standard methods become increasingly slow, scaling polynomially with the number of training examples. In this work, we introduce Des-q, a novel quantum algorithm to construct and retrain decision trees for regression and binary classification tasks. Assuming the data stream produces small, periodic increments of new training examples, Des-q significantly reduces the tree retraining time. Des-q achieves a logarithmic complexity in the combined total number of old and new examples, even accounting for the time needed to load the new samples into quantum-accessible memory. Our approach to grow the tree from any given node involves performing piecewise linear splits to generate multiple hyperplanes, thus partitioning the input feature space into distinct regions. To determine the suitable anchor points for these splits, we develop an efficient quantum-supervised clustering method, building upon the q-means algorithm introduced by Kerenidis et al. We benchmark the simulated version of Des-q against the state-of-the-art classical methods on multiple data sets and observe that our algorithm exhibits similar performance to the state-of-the-art decision trees while significantly speeding up the periodic tree retraining.
Paper Structure (50 sections, 24 theorems, 128 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 50 sections, 24 theorems, 128 equations, 6 figures, 4 tables, 2 algorithms.

Key Result

Lemma 5.1

Let $X \in \mathbb{R}^{N \times d}$ be a given dataset. Then there exists a classical data structure to store the rows of $X$ with the memory and time requirement to create the data structure being $T_{kp} = \mathcal{O}(Nd \log^2(Nd))$ such that, there is a quantum algorithm with access to the data in time $T = \mathcal{O}(\text{poly}\log (Nd))$.

Figures (6)

  • Figure 1: Diagram of Des-Q. We highlight the procedure to construct a decision tree from root node to depth $D$. The yellow arrows indicate communication between classical components whereas the light blue arrows indicate between classical and quantum components. To grow the tree from root to depth 1, we load the data into the KP-tree data structure from which the samples are queried in superposition to the quantum computer, and the feature weights are estimated. Subsequently, one performs weighted (supervised) clustering to generate $k$ clusters corresponding to depth 1. The above procedure is repeated to grow the tree up to depth $D$. Finally, we perform the leaf label extraction where we assign classes to the leaf nodes.
  • Figure 2: Entropy as a function of the number of clusters. Des-c is compared with the same method but without weight (no weight). The boxes represent the statistics over the ten folds considered. We include the baseline, which is the entropy calculated by the baseline method. The shaded area corresponds to the standard deviation of the median entropy obtained by the baseline method (the median value is shown with the solid line).
  • Figure 3: Performance in the training of the decision trees for the PIMA dataset: (a) shows the entropy at each depth as a function of the tree depth and (b) shows the accuracy (in $\%$) as a function of the tree depth. It is compared Des-c (solid line) to the same method but with no weight (no weight) (dotted line) for different numbers of clusters ($k$), shown in colors. We compared against the baseline (cyan solid line). The values shown correspond to the mean across the folds. We avoid plotting the error bars (standard deviation) for visualization purposes and error values are in line with the values in Table \ref{['table:main_results_classification']}.
  • Figure 4: Variance as a function of the number of clusters. It is compared Des-c and the same method but without weight ( no weight). The boxes correspond to the statistics over the ten folds considered. We include the baseline, which is the variance calculated by the baseline method. The shaded area corresponds to the standard deviation of the variance corresponding to the baseline method (the median value is shown with the solid line).
  • Figure 5: Performance in the training of the decision trees for the regression of the Boston housing dataset: (a) shows the variance at given depth as a function of the tree depth and (b) shows the RMSE of the predictors as a function of the tree depth. The performance of Des-c (solid line) is compared to the same method but with no weight (no weight ) (dotted line) for different numbers of clusters ($k$), shown in colors. We compared against the baseline (cylon solid line). The values shown correspond to the mean across the folds. The standard deviation is not plotted as error bars to help visualize the trend. In the tables, we report the values with their standard deviation.
  • ...and 1 more figures

Theorems & Definitions (41)

  • Lemma 5.1
  • proof
  • Lemma 5.2: Superposition over example columns
  • Lemma 5.3: Superposition over label data
  • Theorem 5.1
  • proof
  • Theorem 5.2
  • proof
  • Theorem 5.3
  • Theorem 5.4
  • ...and 31 more