Table of Contents
Fetching ...

Using matrix-product states for time-series machine learning

Joshua B. Moore, Hugo P. Stackhouse, Ben D. Fulcher, Sahand Mahmoodian

TL;DR

This work introduces MPSTime, an end-to-end MPS-based framework for learning the joint distribution of univariate time-series data to enable unified imputation and classification. Time-series values are mapped into a finite-dimensional Hilbert space via Legendre-based encoding, and the joint distribution is represented as an MPS, trained with a KL/NLL loss and a DMRG-like sweeping procedure. The authors demonstrate competitive performance on synthetic and real-world datasets spanning medicine, energy, and astronomy, and provide interpretable insights through single-site density matrices and entanglement entropy, along with trajectory generation capabilities. They also show robustness to missing data and discuss extensions to translationally invariant MPS for variable-length sequences and forecasting, with the MPSTime code released for public use.

Abstract

Matrix-product states (MPS) have proven to be a versatile ansatz for modeling quantum many-body physics. For many applications, and particularly in one-dimension, they capture relevant quantum correlations in many-body wavefunctions while remaining tractable to store and manipulate on a classical computer. This has motivated researchers to also apply the MPS ansatz to machine learning (ML) problems where capturing complex correlations in datasets is also a key requirement. Here, we develop and apply an MPS-based algorithm, MPSTime, for learning a joint probability distribution underlying an observed time-series dataset, and show how it can be used to tackle important time-series ML problems, including classification and imputation. MPSTime can efficiently learn complicated time-series probability distributions directly from data, requires only moderate maximum MPS bond dimension $χ_{\rm max}$, with values for our applications ranging between $χ_{\rm max} = 20-160$, and can be trained for both classification and imputation tasks under a single logarithmic loss function. Using synthetic and publicly available real-world datasets, spanning applications in medicine, energy, and astronomy, we demonstrate performance competitive with state-of-the-art ML approaches, but with the key advantage of encoding the full joint probability distribution learned from the data, which is useful for analyzing and interpreting its underlying structure. This manuscript is supplemented with the release of a publicly available code package MPSTime that implements our approach. The effectiveness of the MPS-based ansatz for capturing complex correlation structures in time-series data makes it a powerful foundation for tackling challenging time-series analysis problems across science, industry, and medicine.

Using matrix-product states for time-series machine learning

TL;DR

This work introduces MPSTime, an end-to-end MPS-based framework for learning the joint distribution of univariate time-series data to enable unified imputation and classification. Time-series values are mapped into a finite-dimensional Hilbert space via Legendre-based encoding, and the joint distribution is represented as an MPS, trained with a KL/NLL loss and a DMRG-like sweeping procedure. The authors demonstrate competitive performance on synthetic and real-world datasets spanning medicine, energy, and astronomy, and provide interpretable insights through single-site density matrices and entanglement entropy, along with trajectory generation capabilities. They also show robustness to missing data and discuss extensions to translationally invariant MPS for variable-length sequences and forecasting, with the MPSTime code released for public use.

Abstract

Matrix-product states (MPS) have proven to be a versatile ansatz for modeling quantum many-body physics. For many applications, and particularly in one-dimension, they capture relevant quantum correlations in many-body wavefunctions while remaining tractable to store and manipulate on a classical computer. This has motivated researchers to also apply the MPS ansatz to machine learning (ML) problems where capturing complex correlations in datasets is also a key requirement. Here, we develop and apply an MPS-based algorithm, MPSTime, for learning a joint probability distribution underlying an observed time-series dataset, and show how it can be used to tackle important time-series ML problems, including classification and imputation. MPSTime can efficiently learn complicated time-series probability distributions directly from data, requires only moderate maximum MPS bond dimension , with values for our applications ranging between , and can be trained for both classification and imputation tasks under a single logarithmic loss function. Using synthetic and publicly available real-world datasets, spanning applications in medicine, energy, and astronomy, we demonstrate performance competitive with state-of-the-art ML approaches, but with the key advantage of encoding the full joint probability distribution learned from the data, which is useful for analyzing and interpreting its underlying structure. This manuscript is supplemented with the release of a publicly available code package MPSTime that implements our approach. The effectiveness of the MPS-based ansatz for capturing complex correlation structures in time-series data makes it a powerful foundation for tackling challenging time-series analysis problems across science, industry, and medicine.

Paper Structure

This paper contains 35 sections, 27 equations, 22 figures, 4 tables, 1 algorithm.

Figures (22)

  • Figure 1: Mapping between quantitative formulations of quantum many-body physics and time-series analysis. Concepts in quantum physics (left, blue shading) and time-series analysis (right, red shading), and their associated theoretical and analytic tools, share similarities that motivate our MPS-based approach to time-series ML. (a) A $d$-level spin chain (left) exhibits one-dimensional (1D) spatial ordering, analogous to the 1D temporal ordering of time series (right). In this work, we propose mapping each time-series sample to a discrete state, similar to an individual $d$-level quantum spin. (b) The probability density captured by the square of the wavefunction $|\Psi(\mathbf{s})|^2$ (left), is analogous to the joint probability density $p(\mathbf x)$ (right), which assigns a probability to every possible time series. Here, the square of the wavefunction encodes a probability distribution over the space of possible measurement outcomes, while the joint density $p(x)$ defines a distribution over the space of possible time series (i.e., $\mathbb{R}^T$). A single measurement outcome (i.e., one spin chain configuration $\mathbf{s}$) governed by Born's rule corresponds to sampling a single time-series realization (i.e., one time-series instance $\mathbf{x}$) from its generative distribution. (c) The entanglement entropy (left), which quantifies the degree of quantum entanglement between two spatial subsystems (e.g., $A$ and $B$), is conceptually related to classical quantities such as the mutual information (right) between temporal segments (e.g., $A$ and $B$), both capturing statistical dependencies in their respective domains.
  • Figure 2: MPSTime, a framework for time-series machine learning with Matrix-Product States (MPS).(a) Encoding: Each real-valued time series amplitude $x_t$ is encoded in a $d$-dimensional vector $\phi_t$ by projecting its value onto a truncated orthonormal basis with $d$ basis functions. An entire time series (of length $T$ samples) is then encoded as a set of $T$$\phi_t$ vectors, which we represent as a product state embedded in a $d^{T}$ dimensional Hilbert space. (b) MPS training: Using observed time series from a dataset, a generally entangled MPS -- depicted here using Penrose graphical notation -- with maximum bond dimension $\chi_{\rm{max}}$ is trained with a DMRG-inspired sweeping optimization algorithm to approximate the joint distribution of the training data. Two copies of the trained MPS (one conjugate-transposed, denoted by the dagger $\dagger$) with open physical indices encodes the learned distribution, allowing us to sample from and do inference with complex high-dimensional time-series distributions. In this work, we introduce MPS-based learning algorithms, which we collectively refer to as MPSTime, for two important time-series ML problems: (c) imputation (inferring unmeasured values of a time series), and (d) classification (inferring a time-series class). (c) Generative time-series modeling: we use conditional sampling to perform imputation of missing datapoints. Known points of a time series (black lines) project the MPS into a subspace, which is then used to find the unknown datapoints (red line). The same method can be used to tackle some forecasting problems if the missing points are future values. (d) MPS for classification: multiple labeled classes of time series are used to train MPSs. Taking the overlap of unlabeled time-series data (encoded as a product state) with each MPS determines its class.
  • Figure 3: An MPS-based algorithm for time-series imputation. Here we consider an illustrative example of an imputation problem involving a six-site MPS, represented graphically with Penrose notation penrose1971applications, where the time-series values $x_2, x_4, x_6$ are observed, and we would like to impute the unobserved values $x_1, x_3, x_5$. (a) Two copies of the trained MPS (one conjugate-transposed, indicated by the dagger, $\dagger$) with open physical indices encode the joint distribution over all possible states, given by $\rho_{1,2,\dots,6}$ (Eq. \ref{['eq:outer-joint']}). (b) The MPS is projected into a subspace where the states $s_2$, $s_4$, $s_6$, corresponding to each of the known time-series values, have been measured. The updated MPS now encodes the joint distribution over the remaining states $s_1$, $s_3$, $s_5$, conditional upon having measured $s_2, s_4, s_6$. (c) The single-site conditional reduced density matrix $\rho$ (as in Eq. \ref{['eq:ssrdm']}) is obtained by tracing over all remaining unmeasured sites. By evaluating the probability density function $\textrm{pdf}_i(x) = \phi_i^\dagger(x) \rho_i \phi_i(x)$ for $x$ in the encoding domain, we estimate the first unobserved value $x_1$ and its uncertainty $\Delta x_1$ using the median (see Eq. \ref{['eq:pdf_median']}) and weighted median absolute deviation (WMAD) of the probability density function, respectively. (d) The MPS is projected onto the estimated state $s_1 = \phi_1(x_1)$ (Eq.\ref{['eq:ssrdm']}), yielding an updated MPS which encodes the joint distribution over remaining unmeasured states $s_3, s_5$. (e, f, g) We repeat the process of estimating the next missing value (i.e., $x_3$) using the median of the corresponding pdf (conditioned on all previously known and imputed values), then projecting the MPS onto the estimated state (i.e., $s_3 = \phi_3(x_3)$), until all remaining unobserved values are recovered.
  • Figure 4: Matrix-Product State (MPS)-based time-series imputation of synthesized datasets. We compare MPSTime (solid line) against the 1-nearest-neighbor imputation (1-NNI) benchmark (dashed black line) on synthetic datasets of phase-randomized, noisy trendy sinusoids (NTS) generated by the model defined in Eq. \ref{['eq:noisy-trendy-sinusoid']}. The test set mean absolute error (MAE) between the imputed values and ground-truth (unobserved) values is reported across varying percentages of missing data. (a)Simple setting: MPSTime with three different bond dimensions $\chi_{\rm{max}} = 20, 30, 40$ (blue, orange, and green, respectively) is trained on a dataset exhibiting limited dynamical variation and a correspondingly simpler joint distribution with fixed trend $m$, fixed period $\tau$, and noise level $\sigma = 0.1$. (b)Challenging setting: MPSTime with parameters determined by hyperparameter tuning trained on four datasets NTS2--5 (blue, orange, green, red, respectively) that exhibit richer dynamical variability and increasingly complex underlying joint distributions, constructed by varying the trends $m$ and/or periods $\tau$ of noise-corrupted, phase-randomized sinusoids with $\sigma = 0.1$, for the model defined in Eq. \ref{['eq:noisy-trendy-sinusoid']}. The upper-right inset zooms in on missing data percentages from $5\%$ to $85\%$ to highlight performance differences across the datasets. In both (a) and (b), we show representative time-series examples of the MPSTime-imputed values on unseen ('test') time series generated from the same model [NTS1 dataset with $\chi_{\rm max} = 40$ for (a) and NTS5 dataset with $\chi_{\rm max} = 160$ for (b)] in the panels (ii)--(v). The red shading in representative time-series examples indicates the imputation uncertainty, which here is quantified using the weighted median absolute deviation (WMAD).
  • Figure 5: MPSTime exhibits competitive (and often superior) performance to specialist benchmark algorithms for time-series imputation on real-world datasets. Imputation error, reported as the Mean Absolute Error (MAE) and 95% CI across 30 test folds is shown as a function of percentage data missing for each dataset in panels (i): (a)(i) ECG, (b)(i) Power Demand, (c)(i, iv) Astronomy (see Sec. \ref{['sec:real-world-dataset-details']}). MPSTime (red) is compared against four benchmarks: CDRec (blue), CSDI (green), 1-NNI (gold), and BRITS-I (purple). For Astronomy, the imputation performance on (c)(i) $\gamma$ Doradus and (c)(iv) RR Lyrae is plotted separately due to differing MAE scales. Benchmarks with substantially higher errors for $\gamma$ Doradus stars (BRITS-I and CSDI) and RR Lyrae stars (BRITS-I) were excluded from the Astronomy panels to maintain visual clarity among the top-performing algorithms. Panels (ii)--(v) for ECG and Power Demand show representative MPSTime imputations (solid red line), uncertainty due to encoding error (shaded ribbons), and the corresponding behavior of the 1-nearest neighbor (1-NNI) benchmark (solid gold line). For Astronomy, the panels (c)(ii, iii) and (c)(v, vi) show selected examples of the best performing MPSTime imputations for $\gamma$ Doradus stars and RR Lyrae stars, respectively. In each imputation example, shaded regions denote observed segments, and transparent regions indicate missing (unobserved) data blocks. Ground-truth (unobserved) time-series values (gray line) are overlaid for reference.
  • ...and 17 more figures