Table of Contents
Fetching ...

Matching correlated VAR time series

Ernesto Araya, Hemant Tyagi

TL;DR

This work tackles the problem of matching two correlated time-series under a CVAR(1,d,T) model, where a base VAR process is perturbed and permuted by an unknown π^*. The authors derive the maximum-likelihood estimator (MLE), which reduces to a quadratic assignment problem, and propose tractable alternatives via linear assignment (LA) and convex relaxations over the Birkhoff polytope with alternating minimization. They establish recovery guarantees for the LA estimator under the condition ∥A^*∥_2 < 1, yielding regimes of exact, partial, and sublinear recovery as a function of the noise level σ; they also develop relaxed-MLE algorithms and rounding strategies, validated by extensive numerical experiments showing LA often matches or outperforms MLE-relaxation approaches. The results advance planted-matching analysis from i.i.d. point clouds to correlated time-series data and offer practical algorithms for aligning unordered correlated time series in privacy, sensor fusion, and time-series alignment applications.

Abstract

We study the problem of matching correlated VAR time series databases, where a multivariate time series is observed along with a perturbed and permuted version, and the goal is to recover the unknown matching between them. To model this, we introduce a probabilistic framework in which two time series $(x_t)_{t\in[T]},(x^\#_t)_{t\in[T]}$ are jointly generated, such that $x^\#_t=x_{π^*(t)}+σ\tilde{x}_{π^*(t)}$, where $(x_t)_{t\in[T]},(\tilde{x}_t)_{t\in[T]}$ are independent and identically distributed vector autoregressive (VAR) time series of order $1$ with Gaussian increments, for a hidden $π^*$. The objective is to recover $π^*$, from the observation of $(x_t)_{t\in[T]},(x^\#_t)_{t\in[T]}$. This generalizes the classical problem of matching independent point clouds to the time series setting. We derive the maximum likelihood estimator (MLE), leading to a quadratic optimization over permutations, and theoretically analyze an estimator based on linear assignment. For the latter approach, we establish recovery guarantees, identifying thresholds for $σ$ that allow for perfect or partial recovery. Additionally, we propose solving the MLE by considering convex relaxations of the set of permutation matrices (e.g., over the Birkhoff polytope). This allows for efficient estimation of $π^*$ and the VAR parameters via alternating minimization. Empirically, we find that linear assignment often matches or outperforms MLE relaxation based approaches.

Matching correlated VAR time series

TL;DR

This work tackles the problem of matching two correlated time-series under a CVAR(1,d,T) model, where a base VAR process is perturbed and permuted by an unknown π^*. The authors derive the maximum-likelihood estimator (MLE), which reduces to a quadratic assignment problem, and propose tractable alternatives via linear assignment (LA) and convex relaxations over the Birkhoff polytope with alternating minimization. They establish recovery guarantees for the LA estimator under the condition ∥A^*∥_2 < 1, yielding regimes of exact, partial, and sublinear recovery as a function of the noise level σ; they also develop relaxed-MLE algorithms and rounding strategies, validated by extensive numerical experiments showing LA often matches or outperforms MLE-relaxation approaches. The results advance planted-matching analysis from i.i.d. point clouds to correlated time-series data and offer practical algorithms for aligning unordered correlated time series in privacy, sensor fusion, and time-series alignment applications.

Abstract

We study the problem of matching correlated VAR time series databases, where a multivariate time series is observed along with a perturbed and permuted version, and the goal is to recover the unknown matching between them. To model this, we introduce a probabilistic framework in which two time series are jointly generated, such that , where are independent and identically distributed vector autoregressive (VAR) time series of order with Gaussian increments, for a hidden . The objective is to recover , from the observation of . This generalizes the classical problem of matching independent point clouds to the time series setting. We derive the maximum likelihood estimator (MLE), leading to a quadratic optimization over permutations, and theoretically analyze an estimator based on linear assignment. For the latter approach, we establish recovery guarantees, identifying thresholds for that allow for perfect or partial recovery. Additionally, we propose solving the MLE by considering convex relaxations of the set of permutation matrices (e.g., over the Birkhoff polytope). This allows for efficient estimation of and the VAR parameters via alternating minimization. Empirically, we find that linear assignment often matches or outperforms MLE relaxation based approaches.

Paper Structure

This paper contains 57 sections, 17 theorems, 160 equations, 9 figures, 2 algorithms.

Key Result

Lemma 1

Given $\sigma$, the MLE for $(\pi^*,A^*)$ is found by solving where we set $\pi^{-1}(0), x^\#_0, x_0 \equiv 0$ for notational convenience.

Figures (9)

  • Figure 1: Recovery fraction vs. scale $\theta$ using Algorithm \ref{['alg:relaxMLE_round']}. We assume known $A^*$ of the form \ref{['eq:parametric_A*']}, and consider different values for $\theta$. For each $(\theta,\sigma)$ pair, the plotted value corresponds to the average over $30$ Monte Carlo samples of the CVAR($1,d,T;A^\ast,\pi^\ast,\sigma$) model. The error bars reflects one standard deviation above and below the mean.
  • Figure 2: Recovery fraction vs. noise $\sigma$ using Algorithm \ref{['alg:relaxMLE_round']}. The setting is analogous to Fig.\ref{['fig:recovery_vs_scale_A_known']}.
  • Figure 3: Recovery fraction vs. scale $\theta$ using Algorithm \ref{['alg:TS_matching_alternating']} with $K=5$. $A^*$ (unknown) is of the form \ref{['eq:parametric_A*']}. We average $30$ Monte Carlo samples of the CVAR($1,d,T;A^\ast,\pi^\ast,\sigma$) model. The error bars reflects one standard deviation above and below the mean.
  • Figure 4: Recovery fraction vs. noise $\sigma$ with $A^*$ unknown. This figure is complementary to Fig. \ref{['fig:recovery_vs_scale_A_unknown']}, under an analogous setting.
  • Figure 5: Estimation error for $A^*$. We fix $d=5$, $\sigma=0.5$, $\theta=0.5$. In Fig.\ref{['fig:MSE_A']} we plot MSE($A$) for $T\in\{10,20,30,50,100\}$ averaged over $30$ Monte Carlo samples (the error bars reflect one standard deviation above and below the mean). Fig.\ref{['fig:scatter_A_Pi']} is a scatter plot, over $50$ samples, for the error of estimating $\Pi^*$ and $A^*$. The dashed line represent the linear trend.
  • ...and 4 more figures

Theorems & Definitions (35)

  • Remark 1
  • Lemma 1: MLE for CVAR given $\sigma$
  • proof
  • Remark 2: MLE when $\sigma=0$
  • Lemma 2
  • Remark 3: Unordered base time-series
  • Remark 4
  • Theorem 1
  • Remark 5: On the factor $(1-\|A^*\|_2)^5$
  • Remark 6: Gaussian assumption on $(\xi_t)_{t=1}^T$
  • ...and 25 more