Table of Contents
Fetching ...

TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series

Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Nicolas Chapados, Alexandre Drouin

TL;DR

A new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including forecasting, interpolation, and their combinations, is introduced, wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially.

Abstract

We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including forecasting, interpolation, and their combinations. Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS), wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially. The new objective requires the introduction of a training curriculum, which goes hand-in-hand with necessary changes to the original architecture. We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks, while maintaining the flexibility of prior work, such as seamless handling of unaligned and unevenly-sampled time series. Code is made available at https://github.com/ServiceNow/TACTiS.

TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series

TL;DR

A new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including forecasting, interpolation, and their combinations, is introduced, wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially.

Abstract

We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including forecasting, interpolation, and their combinations. Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS), wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially. The new objective requires the introduction of a training curriculum, which goes hand-in-hand with necessary changes to the original architecture. We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks, while maintaining the flexibility of prior work, such as seamless handling of unaligned and unevenly-sampled time series. Code is made available at https://github.com/ServiceNow/TACTiS.
Paper Structure (40 sections, 2 theorems, 25 equations, 15 figures, 12 tables)

This paper contains 40 sections, 2 theorems, 25 equations, 15 figures, 12 tables.

Key Result

Proposition 1

(Invalid Solutions) Assuming that all random variables $X_1, \ldots, X_d$ have continuous marginal distributions and assuming infinite expressivity for $\{F_{\phi_i}\}_{i = 1}^d$ and $c_{\phi_c}$, Problem (eq:general-opt-problem) has infinitely many invalid solutions wherein $c_{\phi_c}$ is not the

Figures (15)

  • Figure 1: tact-0.5is-2 outperforms tact-0.5is in (i) density estimation (lower validation negative log-likelihoods, NLL) and (ii) training compute (fewer floating point operations, FLOPs) in real-world forecasting tasks (see \ref{['sec:results']}).
  • Figure 2: The tact-0.5is-2 architecture with the dual encoder and the decoder. The training curriculum (bottom right) shows the proposed two-stage approach.
  • Figure 3: The density of the learned copula (contours) closely matches that of the ground truth (colors).
  • Figure 4: tact-0.5is-2 converges to better NLLs using fewer FLOPs than tact-0.5is, as well as an ablation that trains all parameters jointly without the two-stage curriculum. Vertical bars indicate the latest convergence point over 5 runs with a maximum duration of three days.
  • Figure 5: An illustration of the flexibility of tact-0.5is-2.
  • ...and 10 more figures

Theorems & Definitions (6)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • proof
  • proof